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SMALL AREA ESTIMATION USING A MULTINOMIAL LOGIT MIXED 
MODEL WITH CATEGORY SPECIFIC RANDOM EFFECTS 


Janice Scealy 
Australian Bureau of Statistics 
and Australian National University 


ABSTRACT 


This paper describes a model based approach to producing small area estimates of 
counts for different categories of the Australian labour force based on a multinomial 
logit mixed model with category specific random effects. By category specific we 
mean that within each small area there are two correlated random effects, one 
associated with the employed category and the other associated with the unemployed 
category. Estimates of the model parameters are produced using penalized 
quasi-likelihood combined with approximated restricted maximum likelihood 
estimation and using these, estimated counts are then produced for each small area. 
Mean squared error estimates of the estimated counts are approximated using two 
methods: 1) a parametric bootstrap and 2) analytical approximations and we compare 
the performance of both. Using a parametric bootstrap we also examine the 
properties of the combined penalized quasi-likelihood and restricted maximum 
likelihood estimators and discuss model goodness of fit measures and diagnostics. 


Keywords: small area estimation, multinomial logit mixed model, parametric 


bootstrap, labour force survey. 
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1. INTRODUCTION 


The Australian Bureau of Statistics (ABS) produces labour force estimates using direct 
survey estimators for regions with large enough sample sizes for these estimators to 
be reliable. In recent years there has been a growing demand for labour force 
estimates to be produced in smaller geographical regions. The direct estimates for 
these small regions are considered to be too unreliable because the standard errors 
are large due to small sample sizes. A way around this is to produce model based 
estimates which borrow strength from administrative and Census data and other types 
of auxiliary variables. The hope is that the model based estimators will produce 
estimates with mean squared error less than the direct survey estimators. 


The model based approach relies on an appropriate choice of model and good 
auxiliary variables. The aim here is to produce estimates for each of three labour force 
statuses: employment, unemployment and not in the labour force for a set of small 
areas. Auxiliary data are available within age/sex classes for each small area and sample 
counts of the three labour force statuses are obtained from the Australian Labour 
Force Survey. The total numbers of people within each sex/age group are also 
assumed known and are obtained from the Estimated Resident Population (ERP) 
projections published by the ABS (for further details, see ABS, 2007). Molina et al. 
(2007) describe a methodology based on the application of the multinomial logit 
mixed model which can be used to produce estimates in this small area estimation 
situation. Random area effects are included in the models to account for potential 
correlations between the age/sex class counts in the small areas not explained by the 
auxiliary variables. The inclusion of random area effects in the model specifically 
accounts for the area level variation not explained by the auxiliary variables. 


In the model described in Molina et al. (2007), only one random area effect is used 
within each small area and the random effect is therefore the same across the 
multinomial classes. In our situation this may not be appropriate. Some work carried 
out at the ABS on fitting three separate logistic mixed models to the data suggests that 
the variances of the random effects are not the same across each category. A more 
appropriate model would be to introduce category specific random effects. This 
allows for the variances of the random effects to differ between the categories and also 
allows for a potential correlation between them as well. In our case it does not make 
sense to assume that the category specific random effects are perfectly correlated. By 
allowing for a general arbitrary covariance matrix, Hartzel et al. (2001) make the point 
that the model will be structurally the same regardless of the choice of baseline 
category which is a good property. 
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In this paper we extend the model in Molina et a/. (2007) to include category specific 
random effects. To estimate model parameters, we develop a similar penalized 
quasi-likelihood (PQL) estimation scheme with approximate maximum likelihood 
(ML) and/or restricted maximum likelihood (REML) for the variance components. 
These parameter estimates are then used to produce estimated labour force counts 
for each small area. Mean squared error estimates of the estimated counts are 
approximated using two methods: 


1. aparametric bootstrap, and 
2. analytical approximations, 


and we compare the estimates produced using these two methods. Using a 


parametric bootstrap, we also examine the properties of the combined PQL and REML 


estimators and discuss model goodness of fit. Unlike Molina et al. (2007), we also 
consider estimation for out-of-sample small areas and briefly review some alternative 
estimation schemes. Note that the primary focus of this paper is to give technical 
details on how one might produce model based estimates for the Australian labour 
force. This is an experimental procedure and the ABS will not be publishing any 
model based estimates for the Australian labour force as part of the ABS product at 
this stage. 
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2. THE MODEL 


Similar to Molina et al. (2007) let index 7 (¢ = 1, 2, ...,) denote the sex/age groups and 
d (d=1,2,...,D) denote the small areas. We have a slightly different set up because Ig 
is not constant (gq < 10). The reason why we have unbalanced data is because some 
of the sex/age groups within a small area have no sample and will be excluded from 
model estimation since they do not contribute to the likelihood. The labour force 
sample counts are denoted by yg, yai2 and yqz which represent respectively, 
employment, unemployment and not in the labour force counts in sex/age group 7 in 
the d-th small area. Let mg =Vai1 +Vdi2 +Vai3 denote the sample size and pat, Pai2 
and p,jz denote the respective probabilities of employed, unemployed and not in the 
labour force. Let 2g, and wg2 denote the category specific random effects. We 
assume that the vectors (ygi1,Vdi2,Vai3)' given mg and ug =(Ugi,Uq2)! are 
independent across d and 7 with multinomial distribution, that is with the probability 
density function 


Ma)! : : Vai 
f Vai Vai2 | Ya) = ——*"__ Daft Pals Bal (2.1) 
Vir V diz: V dir" 


It is also assumed that for 7 = 1,2 


Pai 


log = x4 Bj + Ug (2.2) 


Pai 
where f; is a vector of parameters and xq is a vector of explanatory variables 
associated with the j-th category. Note that in (2.1) above, technically we should also 
be conditioning on xqj, that is, fshould be defined as flyai1,Vai2 |Ud,Xay). It will be 
assumed throughout this paper that whenever we condition on wg we also condition 
on xj even when it is omitted from the notation. We also assume that aq is 
independently and identically distributed as bivariate normal and its probability 
density function is 


Tipe ei 
1 —sUuygwy uy 
f(ug)=———e ? 
2n|Wa )? 
where Wa= “ a 
Yi2 P2 


Under this model we have a vector of variance components 9 = (91, 92, 912)’ that will 
need to be estimated along with B= (64, 5)’. This model is a GLMM (generalised 
linear mixed model) and to estimate the parameters a variety of different techniques 
can be used. In the next section we discuss some of these. 
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3. ESTIMATION OF ZAND 


Let ya =(Vai1,Vai2)’ for 7=1,2,...,Jg and d=1,2,...,D and let y be the vector 
obtained by stacking the y4;’s into a column. Using the definition of extended 
likelihood in Pawitan (2001) we may write the likelihood function for f, g and 
u =(U},U5,...,Ufp)! as 


L(B,9,u) = f(y |u)f(u) 
eee D 
7 TEL Yo | «| Ere} (3.1) 
d=1 


d=1 i=1 


Pawitan also notes that this definition of the likelihood is called hierarchical likelihood 
by Lee and Nelder (1996). 


For the estimation of B and @, ideally likelihood based estimation should be based on 
maximising L(B, g), where 


D Iq D 
L(B,g) = | Ti [fOr 2a | oT Fs} 
d=1 


d=1i=1 


D © Poo La 
=|] [ete (Ug) Tl f Van Vai2 Ua } dg dug (3.2) 
d=1 i=l 


To estimate B and g one could try to maximise the marginal likelihood defined at (3.2) 
using, for example, Monte-Carlo methods or numerical integration techniques to 
evaluate the integrals. A procedure like Newton-Raphson could then be used to solve 
the likelihood equations since the equations are non-linear. More specifically, 
Hartzel et al. (2001) describe an adaptive Gauss-Hermite, quasi-Newton algorithm 
which could be used in the multinomial logit mixed model case. This method is 
appropriate when the dimension of the integrals are small and the dataset size is not 
large. In our case, the dimension of the integrals are small but the dataset size is large. 
That is, there are a large number of double integrals to evaluate at each iteration and 
convergence will therefore be very slow. 


Another method which could be used to maximise (3.2) is to use an automated 
Monte Carlo EM algorithm as described in Hartzel et al. (2001). Again this will be 
computationally intensive in our case. The method consists of implementing an EM 
algorithm which treats the random effects uw as the missing data. In the E-step a 
conditional expectation needs to be evaluated and this is approximated by using 
Monte Carlo methods (actually, an independent sample from the conditional 
distribution is generated). The M-Step is then undertaken by maximising this 
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approximate expectation. At each EM iteration, the Monte Carlo sample size is 
increased in an automated way until convergence. In any case, for computation 
reasons we do not pursue exact marginal likelihood maximisation approaches further 
here. Instead we will now discuss approximate methods. 


The approach taken in Molina et al. (2007) is to use the PQL (penalized 
quasi-likelihood) method introduced by Breslow and Clayton (1993) combined with 
either ML (maximum likelihood) or REML (restricted maximum likelihood) for the 
variance components. The PQL method of Breslow and Clayton (1993) obtains 
estimates of 6 given g by maximising an approximation to the marginal likelihood 
L(B,g). Part of this approximation involves a Laplace integral approximation. 
Estimates for zw are also produced as a by-product of the approximation and hence the 
method produces joint estimates of B and uw given g. As Jiang (2007) states, the POL 
method is equivalent to the maximum hierarchical likelihood method of Lee and 
Nelder (1996) in the case of normality of the random effects. The maximum 
hierarchical likelihood method obtains joint estimates of B and u by maximising the 
hierarchical likelihood (the log of (3.1) in our case). Interestingly in the case of 
normal linear mixed models the estimate of 6 obtained by maximising L(f, g) and 
L(B,9,u) given g are equivalent. But this is not the case in general for GLMMs. 


As mentioned by Jiang (2007), Lee and Nelder (1996) showed that in general the 
maximum hierarchical likelihood estimates of the fixed effects are asymptotically 
equivalent to the marginal maximum likelihood estimates of the fixed effects. On the 
surface this suggests that when the sample sizes are large, then to obtain estimates of 
f, maximising (3.2) and (3.1) are equivalent. However as Jiang (2007) states, the 
asymptotics here are in the sense of the cluster sample sizes approaching infinity, but 
the number of clusters remaining bounded. This is not satisfied in our small area 
estimation case since the number of clusters (small areas) is large but the cluster 
sample sizes are small and bounded (< 10, since the sampled units are the age/sex 
classes). Therefore estimators for PB derived from maximising (3.1) will not be 
equivalent to maximising (3.2) even as we increase the number of small areas in the 
sample. As Jiang (2007) also states, there are a number of approximations involved in 
deriving the PQL and these approximations have introduced bias into the estimates 
and this bias does not vanish asymptotically (PQL estimators are known to be 
inconsistent). When the cluster sample sizes are small there is insufficient information 
to estimate both the random and fixed effects B simultaneously. 


Hartzel et al. (2001) points out that PQL methods have been shown to be biased 
especially for highly non-normal cases such as Bernoulli response data and biases tend 
to increase as the variance components increase. Hartzel et al. (2001) suspect that a 
similar problem will exist for the multinomial logit random effects model when the 


multinomial sample sizes are small. In our case the variance components are 
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expected to be small and the average 7, is approximately 13. Also note that the 
number of multinomial observations per cluster is < 10. In most cases these are 
exactly 10. Therefore the m,; and cluster sizes might be sufficiently large together 
and the variance components sufficiently small to allow the PQL estimators to work 
reasonably well. This will of course need to be confirmed via simulation which is 
undertaken later. 


In any case, two issues have been highlighted so far. The first is the computational 
difficulty of maximising (3.2) directly due to the presence of the integrals which have 
no closed form solution and the second is the inconsistency of the PQL estimates 
associated with maximising (3.1). These two issues give some motivation for trying to 
come up with alternative estimators. Jiang (1998) proposes an alternative estimation 
method called the method of simulated moments which is both computationally 
attractive and results in consistent estimators. A set of estimating equations are 
obtained by equating sample moments of the sufficient statistics to their expectations. 
The expectations are then approximated by simulating sequences of normal random 
variables (i.e. the integrals in the expectations are approximated by Monte Carlo 
simulation). The equations are then solved by a Newton-Raphson procedure. 
However, there is one issue associated with this method. Jiang (1998) shows that the 
method of simulated moment estimators can be quite inefficient. For small samples 
the method of simulated moment estimators seem to have substantially larger 
variance than estimators based on PQL. So it appears there is a bias versus variance 
trade-off when choosing between such methods as PQL or the method of simulated 
moments. There are clearly issues associated with all estimators that have been 
discussed so far. 
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4. PQL ESTIMATION OF Z AND zz 


To obtain the PQL estimates of B and u we need to maximise the log of the joint 
likelihood defined at (3.1). Assume for the moment that the variance components @ 
are known. The joint log likelihood is 


1 D a3 D qq 3 
WB,uy=c-= SY uyWy'ta + YY Vai 108 Pay: (4.1) 
2 G1 d=1i=1 j=1 


where c is a constant. The maximum likelihood estimators can be obtained by 
equating the first derivatives of (4.1) to zero and then solving this system of equations. 
Let g=1,2,...,Q; index the components of the vectors 6; and xq and denote the q-th 
component of each by fq) and xgq respectively. By noting that 


exai By tua, 
Pain = 
1+ exahy tea a e%ai2Pe +Ug2 
ex arb, +Ug2 
Pai2 = 
1+ exa hy tua, + exarP, tug? 
_ 1 
and Paz = 


+ t 
1+ exahi tua + er anrPr +Ug2 


after some algebra it can be shown that for = 1,2,3; 7’ = 1,2; g=1,2, eer Oy 
GaN 2 et ana = 2 od), 


GOS Pap Xayiqy(1— Puig) iff = 7 
OP (4) —Xaij'(q) Paij' otherwise 


and for’ = 1,2 andqg=1,2,...,O;, 


Up.u) _ 
B Pot) MS DX ai'(@) (vay Mai Pay): (4.2) 
Big) aia 


Now we need to find the first derivatives with respect to the random effects. Again 
after some algebra it can be shown that for /= 1,2,3 ; 7! =1,2 ;7=1,2,...,Jg; 
d=1,2,...,D and d' =1,2,...,D, 


i l- pay iff= jf andd=d' 
O ms 
8 Pai =4 Dag if7 #7’ andd=d' 


Ou gi! 
“ 0 otherwise, 
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and fory’ =1,2. and @ =1;2;...,D, 


peep ae aes Vaij 108 bay) iF, 
~ (vay —Mq';Paiy'): 
i=1 


OUgy 


Now we note that ford= 1,2, oe _D, 


E 1 

‘eet 2 2 

and uUgWy Ug =——> (Ca — 20 2U gaz + PUa2 ). 
Pia P12 


Therefore for d' = 1,2,...,D andj’ = 1,2, 


a Eee ——— (P2Mg — Pia) ifj"=1 
(Dia ua] _ PM -%2 
Ou te —] “cor 
i > Pita Pitan) iff =2 
PiPo Pia 


and hence for d' =1,2,...,D andj’ = 1,2, 


I 
= ' i 
(2a - Pr2 Marr) + VOay’ —Mg; Pay) iff =1 
Ol( Bu) _ )AP2—- P12 i=1 (43) 
Ougiy' =] Iq or 
(01 Ma'2 - P12 Ua) + VMOay —Mg; Pay) iff =2 
PiP2— Piz =i 


Estimates for B and u are found by equating all the derivatives defined by (4.2) and 
(4.3) to 0 and solving the resulting system of equations. Because these equations are 
non-linear they cannot be solved directly. Instead they can be solved by using a 
Newton-Raphson algorithm. In order to use the Newton-Raphson method we will also 
need to work out all the second derivatives of the loglikelihood function. 


For’ =1,2;7" =1,2;q=1,2,...,Qy andq' =1,2,...,Q, it can be shown that 


4 : “cyt on 
OU Bu) Didar dain diy (aay "ai Pay’ A- Pay) iff = j 


OB iq) P iq) 


D Ly scot n 
De dat Dain ai ayaa" (gai Pay’ Pai a 
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Alsotord 2172.4) Dvd S412 Dey = 1.2 andy" =11,2 


-— i sees ete 
4-4 mai Pain Pan) ifd"=d and / =j =] 
PiP2— Pip 
ant nee 
OU(B,u) _ | PiP2-Vi2 
Chai lay : " ’ nn ” 
_ = Pe mai PanParr ifd" =d' andj" # j 
OEP aT P12 


: j " ’ mn or 
— mai Pair2G— Pair) if@' =d' andy" = 7"=2 


0) ifd" 4d", 
! “ff | 
and ford” =1,2,...,.D;f =1,2;7 =1,2 andqg=1,2,...,Q;, 


. s nw of 
a7 B u) is Xaty qa Paty - Paty’) if/ =j 


Ou grin OP r 
PPI YS sguipice To 
bm Xariy'(qy "ai Pari Pay" if/ x J : 
Similar to Molina et al. (2007), let 04 =(0gi1, 942)’, where for j= 1,2 , 


Oij= ioe 


Pdi3 


We can also write (2.2) as 


04, =XgqjBr+Zqu, 


t 
Xai 1xo, 
where X qj = ; 


t 
O90, *Xai2 


Onda 1 9  O1,2D-a) 
ya 


Z 


Onda 9 1 O1,2D-a) 


and Og+xp+ denotes a matrix of zeros with dimension a* xb*. 
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Denote the mean and covariance matrix of yy given u as Ug; and Xy; and these are 
= t 
Hay = "gi Pairs Pai2) 


fs G- pan) —PanPair | 
and adi = MN gi 4 
— ban Pai2 Pai2A- Paz) 


Again similar to Molina e¢ a/. (2007), let Sz and S;, be the vectors of first derivatives of 
the loglikelihood. That is, 


_| A(B,u) UW Bu) aA B,u) A(B,u) A(Bu)y A(B,u) 
Pia Bia’ Byo) Seay Arey” Pac, 


O(B,u) A(B,u) (Bu) (Bu) a B,u) a) 


and Sy, = Fe ) Fe. ) Fe. Ps 6 yeeey Fa. ) 6 
Uy1 U2 U1 U9 Up Un2 


In matrix notation we have 


Dla 
Sp = = Xa (Vai — fy) 
d=1i=1 
Pi ee 1 
and =) DZ (Yai — Hai) — >, ZnWa Za 
PzE d=1 


and the non-linear system of equations that we need to solve is described by 


t 
t t 
S =(55,81,) = 0(0,+0,+2D)x1 ° (4.4) 
Let 
Jp bea square symmetric matrix containing all the derivatives of Sg with respect to , 
Ju be a square symmetric matrix containing all the derivatives of S, with respect to z, 


Ju be the matrix containing all the derivatives of S, with respect to f. 
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It can be shown that 


Dla 
Jp =-)) X girudiX di » 
cere 
Dla, 
J pu = => VX aZ a 
d=\i=1 
ae aaa 1 
and Ju=-> > 2a aZa- > ZnW7 Za - 
dAzA d=l 


The Newton Raphson algorithm can now be applied to find the solution to (4.4). This 
iterative algorithm has updating equations as follows 


A=) 
> 


-1 
VALE dae. lp cy ae 
we? u* Fru ps | | 


> 


where the superscript & indicates the iteration number and the current values of all 
parameters are used to evaluate the functions within. Note that in its current form 
(4.5) contains a large matrix which needs to be inverted at each step. This term can 
be further simplified by noting the following partitioned matrix identity given in 
Henderson and Searle (1981) 


Jp J pu 7 
Vu Su 
-1 = 
(Fp -Sputu' Tou) -Ip-JputuTiou) I pula’ a 


-1 -1 
Fa pu(Tp—TpulaTu) Fa +Fa'Tisu(T pS puta' Tou) I pula’ 


for any square matrices Jz and J, with J, nonsingular and J, possibly singular. Molina 
et al. (2007) also make use of this identity. Note that this identity simplifies the 
inversion quite a lot since we no longer need to invert a square matrix of dimension 
Q1+Q2+2D. Instead we need to invertJg Jp Ju Sp, andJu. The matrix 

J pS puta’ J py 18 a square matrix of dimension Q1 +Q2 which is the total number of 
explanatory variables in the model and is a lot smaller than QO; +Q2+2D since D is 
large. As for Jy, this is aa square matrix with dimension 2D which is still quite large. 
But note that J, is a block diagonal matrix with block sizes of 2. So all as we need to 
do in this case is invert a series of 2 by 2 matrices which is trivial. 
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So to compute PQL estimates of B and u we use the iterative formula (4.5) until 
convergence (and using the identity given at (4.6)). An initial guess is needed for the 
parameters to start off the iterations. We suggest using a small value of u as its initial 
value and an initial value of B obtained by fitting a multinomial logit model without 
random effects. 


The criterion for convergence we use is one given in Booth and Hobert (1999) 


Bin ~ Bra] poo <i" | 
(a) Pig) ,f =1,2 and q=1,2,...,Q;, 
|Bhalta 
max SSR oy 
k+1_ oR 
ei — | ; 
———— ,d=1,2,...,.D andj =1,2 
wakes 


where €, and €2 are both small positive numbers (we use €; = 0.01 and €2 = 0.001). 


In all of the above it was assumed that g is known. In the next section we derive an 
approximate maximum likelihood estimator of g given the other terms. To obtain 
joint estimates of , u and g we will need to iterate between updating each, where f 
and u will be updated using the algorithm in this section. 
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5. APPROXIMATE ML ESTIMATION OF 


It was mentioned in Section 3 that for the normal linear mixed model, given the 
variance components, estimates of f obtained by maximising L(f, 9, ) are equivalent 
to maximising L(B, g). However for the normal linear mixed model, given £, ML 
estimates for the variance components based on maximising L(f, g,u) and L(B, g) are 
not equivalent. This suggests that it is probably inappropriate to maximise L(f, g, uw) to 
get estimates of g in our situation too. 


To obtain an approximate marginal likelihood L(B, ), Molina et al. (2007) adapted the 
ideas of Schall (1991) to their bivariate setting. We will follow this approach too. 


Assume that f and uw are known. 


V dij ) dij 


= log 


for j = 1,2. 
Ma — Van ~— Vai2 Vadi3 


ket 8; Va) = log 
A first order Taylor series expansion about the point wg; leads to 
Og ; Og ; 


(Yai — Lair) r 
Vain Ma j Yair Ma 


8; Vai) = 8; Hai) + (Yai2 — Ha2) » forj=1,2. 


Let €4; =(g1(yai), S2(yar))' and eg; = XG} (vai —Madi )- 


Calculating the expressions of the derivatives involved and using matrix notation, the 
above Taylor series expansion becomes 


bai ~ Xai B+Zqurteg; » 


where Varley; |uq) = Zz} and Eq |ua) =022. 
It is also clear that E(€q; |ua) =Xai B+Zaiu and Var(Eq |ug) = Xz}. 


Although it is not clear in Molina et al. (2007), on correspondence with the author the 
following was established. The term €, |ug is assumed to be approximately normal. 
Since wg is normal, this also implies that the joint distribution of €g; and ug is 
approximately normal and hence the marginal distribution of Eg is approximately 
normal. 


Let W =diag(Wy,d=1,2,...,D). Now it is easily shown that E(éq;) =X qj B and 


Var (4) = E (Var (4; |ua)) + Var(E (Ei |ua)) = E (Eat) + ZaiWZty 


The matrix X; is a function of the random effects and we now need to assume that 
this is approximately constant and hence Var(éq) = per +Za;WZ',,, where the random 
effects are replaced by their estimated values. 
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Let € denote the vector that is constructed by stacking the vectors Eg in one column 
and V=Var(é). 


Then V=ZWZ' +="!, where © =diag(Xq;,i =1,2,...,Jg,d=1,2,...,D). 


Now we can define the approximated normal log likelihood for g as (ignoring 
constant terms and assuming f is known) 


I(p) =—Slog|v|-5(§- XB) V'(E-X8), 


where X is obtained by stacking the matrices Xz. Also let Z be the matrix obtained by 
stacking Zg;. Given B, we can now obtain approximate ML estimates of g by 
maximising /(g) with respect to g. However, before we do this we need to simplify /(). 


To compute the PQL estimates of 6 and u, we use the updating equations (4.5). As 
noted by Jiang (2007), the following alternative iterative procedure originally 
proposed by Breslow and Clayton (1993) can also be used to produce the same PQL 
estimates. That is, for fixed g compute 


B= pores) x'vilg 6.1) 
and u=wz'v"(é-xB), (5.2) 


where given € one may first use (5.1) to update B, then use (5.2) to update # then 
update € and so on until convergence. Note that we do not use (5.1) and (5.2) to 
update f and u because (4.5) is computationally more convenient. However the form 
of (5.1) and (5.2) is useful here because we can use these to help simplify /(g). 


From (5.2), (5.1) and the results on page 446 in Pawitan (2001), 
(¢-xh) v\(€-xf)=(€-xh-za) x(¢-xp-za)+a'wla 
and lV|= fet iw ||z'ez +), 


We can now write an approximated pseudo loglikelihood of g as (ignoring terms that 
are not functions of @) 


1 1 En eee 
(g) => log|W|-Slog|z'Ez + |-5 4°74. (5.3) 
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Given current estimates of B and uw, to obtain approximate ML estimates of g we need 
to differentiate (5.3) with respect to g, set the resulting three equations to 0 and solve 
them. 


In the derivation of /(g) it was assumed that Z was constant and does not depend on uw 
or B. When this is the case, joint estimation of f, g and u is equivalent to maximising 
the function 


O(B,9,u) =1(B,9,u) - slog Z'=Z+w (5.4) 


00 _ A(B,9,u) 2Q _ AMB.g.u) , , 62 _ ag) 
Op Op ° du Ou 6p dp 


since 


This justifies the following algorithm to obtain joint estimates of 8, g and u 
1. Compute B and “ given by using (4.5). 
2. Fixing Band w at their current values B and #, update g by maximising (5.3). 


3. — Iterate between 1 and 2 until convergence. 


However this algorithm assumes that 2 is constant in (5.3). Pawitan (2001) notes that 
this algorithm is appropriate when X~lis a slowly varying function of u (w is the vector 
obtained by stacking all the wg;). This means we can ignore the derivative of the 
second term of O with respect to f and w, so the first step is justified. 


Pawitan (2001) notes that for certain generalised linear mixed models studied by 
Breslow and Clayton (1993), estimates of the variance components based on 
maximising (5.4) are close to the exact marginal likelihood estimates provided that the 
variance component is not too large. The method tends to underestimate the 
variance component, and the problem can be severe for large values of the variance 
component. However in our application, the variance components are expected to be 
small, so hopefully this should not be too much of an issue. 


We now need to maximise (5.3) and to do this we need to differentiate this function 
with respect to each component of g. For a= 1,2 and 12, 


al(g) __ 1 dlog|W| | dlog|z'Ez +] 1 du'W'u 


OD, 2 0, 2 OP, 2. OG, 
-1 -1 -1 
abe | ye ee, (z'xz+w-") VES ee OU. ES 
Z 09, | 2 OP, 2 OP, 


where 77| | denotes matrix trace. 
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After some algebra, it can be shown that 


mw | ee ae 
OY, P\P2 — Pi2 
mw OW | - DB, 
092} O2.- Pi 
and now ay = eal 
092) %-—) 
ow! 1 2 -93 
Aisé, ut - on : ul, | P2 aes Ua, 
ual (a. ~ Qh d=1 \P2Pi2 —Pi2 
aw? D 9, 
ae W i yu! ul, PP \2 oe 
OP, (ee = -o; 
(P.—2- A) d=1 \A1Pi2 P1 
,ow! D 29122 -(9B + 2102) 
and u u= Yu; uy ug: 


“P12 ae P12 (o.a.-02) = -(9%2 - p12) 291 P12 


Now for d=1,2,...,D let 
I 
dai = aes Ma; PaiA- Pai), 
I 
qa2 = ae Mg Pai A- Pai) and 
I 
4a3— > a MN gi Pain Pai2 - 


After some algebra it can be shown that 


Tr [32 + ae = 
1 


S da Pr2 + Q2 (14244392 +4a292) 


(5.6) 


6.7) 


(5.8) 


6.9) 


(5.10) 


6.11) 


d=\| (-9%3 + 9192)((1+4a3%12). + (442-4431) 2+ Gai (P1 -Fa2Pin +442019)] 


? 


(5.12) 
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P.+ dart +2443 P12 + Qa2P 2 
= 2 2 2 2 
d=1 (-¢2, +12) ((1 + 443912) +(4a2 —4i301) +day (, —4a2Pi2 +4401) 


(5.13) 


>} (a2 +d PiPi2 + a3 Pi + 4q3P\ P2 + 4421292) 


a= (oR -%92)((1 +4432) +(da2— Was) 2+ Fai ( A —4a2Pr2 +4401) 
(5.14) 


By substituting the relevant terms (5.6)—(5.14) into (5.5), we can now calculate the 
first derivatives of /(g) with respect to 91, g2 and @12. 


t 
1 5 -| AM) Ag) Ho) 
‘ 69, 69, ° OM2 ) 


To obtain an update for g given f and u we need to solve Sy = 03.1. The multinomial 
model described in the paper Molina et al. (2007) has one variance component. In 
this case, an explicit updating formula is available for the variance component based 
on rearranging the single equation 


AP) _o 
dp 


An experiment was undertaken to determine whether simple updating equations 
could be obtained by rearranging the equations Sy =03,; but we could not get it to 
converge. Unfortunately we will need to use a single iteration of the Newton-Raphson 
algorithm to update g instead, which means we will also need all the second 
derivatives of /(Q). 
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After some algebra it can be shown that 


0*1(@) = 1 Sy ‘gf ~3 ~ 9193 4 
Oy; 2 Bee. ge: eee 
i (-92 + e192) = 1292 Pi2P2 


D (4a +4a4a2P2- Gis) 
+> dl da\td2r2 d3r2 


2 ’ 
ae 2((1+ dase y +(da2 — das) 2 +day (a, = Gao i2 +44202)) 


2 3 
e719) _ 1 Su! 20122 -(9i; + PP 1202) 7 
O90, _@2 re ‘ _( gp, + 20,07 
2( Pi2+DiP2 Pi2 + PiP12P2 P1Pi2 
; 2 
D (443 — 4g a2Pi2 +4312) 


£2 


2 ’ 
Ga) 2((1+ dase. \: +(4a2 — das) 2 +dqi (a — Gar? +da2010)] 


OUg) _ 1 Fa 409; (302+ 12) _ 
0M O%2 2(-92 +919» y 4 | @:(302 +9102) —2(02 +9022) 

S (4a — ay Vd2P 12 + 443012) (42302 —dai (1 +4a292)) 

d=1 


2 p) 
(a +4as12) +(4a2- 44301) P2 + Far (P -4a2i2 + dax002)) 


Ug) __ 1 5 | Pez ~ PP x2), 
0p; 2 = : -9 Q ‘ 
2 (-9%: +192) 7 1rl2 1 
, 2 
D (Gaz +419 a2 - a3) 


223 2 2 2 a 
aot 2((1+ aase.2) +(da2 — Gas) 2 +a (2, — Fa2P 2 +442002)) 


719) 2 1 Ss a -2( 9 . P1202) P| (30% +9192) 
3 d 


ug 
092092 2(-o0: +Q,Q> d=1 PY (30% +P P2 AG P12 


D (daz +4arda2Pi ~ Fas) (1aa2i2 — Fas (1+ 443012) 


2 
*1((L+ GasQi2) + (daa Gis) 2+ dan (1 a2 +4a20102)) 
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d°1(g) _ 1 Ss ae P2 (301 +192) -(9i +30,01202) 7 


fiz (-ph +09) = (92 +3092) (302 +2) 


2 
D 2 = +q% 
+) (403 a1 d2P 12 qa312) 


d=1 


2 
(1+ dase Y + (da2- 424301) 92+ dan (91 - Fain + 14202 ) 


2 —(4ai9a2)+ a3 


ai(1+4a3P.2) + (4a2 ~ 44301) + dai (, — Jair + arin) | 


Now let 


alg) le) al@) 
ao, 09,09, O9,0M.2 


alg) alg) eg) | 
OPO, 00, 092092 


alg) dle) dl) 
OP,OP,2 0920 Pp 00 


To obtain an update of g given B and u we simply compute 


update _ _ previous -1 
P = @ =—So So ) 


where all terms on the right hand side are evaluated using the current values. Only 
one update is needed because after g is updated once we then have to go back to 
updating f and uw by using repeats of (4.5) until convergence given the current value 
of @. 


The iterative algorithm for computing joint estimates of 8, u and g stops when both 
(4.7) and the following condition is satisfied for all three variance components 


update previous 
Pa — Pa 
. < €), 
previous 
Pa +é 1 


where subscript @ denotes the particular component. 
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6. APPROXIMATE REML ESTIMATION OF 


Approximate REML (Restricted Maximum Likelihood) estimation could also be used as 
an alternative to approximate ML estimation for estimating g. This approach was also 
undertaken by Molina et al. (2007). 


For normal linear mixed models, it is well known that in some cases the ML estimator 
of the variance components can be downwardly biased (see page 235 of Rao and 
Kleffe (1988)). As mentioned in Harville (1977), one criticism of the ML approach to 
estimating variance components is that the ML estimator takes no account of the loss 
of degrees of freedom for estimating the fixed effects £. For estimating the variance 
components, f can be considered as a nuisance parameter. A way of eliminating the 
influence of f is to construct a marginal (or approximated marginal) ML estimator for 
the variance components. In the normal linear mixed model case, this involves 
transforming the original observations such that the new data are independent of 8 
and then maximising the likelihood for the variance components of this new data. 
The transformed data are called error contrasts as described in Harville (1977) and 
have smaller dimension than the original data. 


In the normal linear mixed model case, the REML loglikelihood can be derived as a 
modified profile likelihood (see pages 286-292 in Pawitan (2001)). In our case the 
approximated REML loglikelihood is proportional to 


I(p)—log|x'v-"x|, (6.1) 


where /(g) is given by (5.3) and as Pawitan (2001) mentions, the second term can be 
interpreted as a penalty term, subtracting from the profile loglikelihood the 
‘undeserved’ information on the nuisance parameter (which is 8). To obtain 
approximate REML estimates of g in our case, we simply need to maximise (6.1) which 
can be done using similar methods to what is described in Section 5. That is, we need 
to implement the same Newton-Raphson algorithm except that now Sy and Jy contain 
extra terms. These extra terms can be obtained by finding all the first and second 
derivatives of the second term in (6.1) with respect to all the variance components. 


Although we are potentially reducing bias by using the approximated REML estimator 
we should keep in mind that the variance may be larger than the variance of the 
approximated ML estimator. Therefore in some situations the MSE of the 
approximated ML estimator might be smaller. See Harville (1977) for further details 
for a discussion in relation to normal linear mixed models. 


In any case we need all the first and second derivatives of log |X‘V-1X]. 
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For a@=1,2 and 12, 


ee 7 
log) x" V x|_ rr] (x! os ix)" “i av — 
OPg a 
=f 
Now Wey yey 
OP, OP, OP, 
and therefore 
@log|X'v |x 
| | n(x" vo 1). xtyz 7 vx] (6.2) 
OP, OP, 


ow ; 1 0 ; 0 0 ; 0 1 
where —— = diag or diag or diag ; 
OP, 0 O O01 1 0 


depending on which value of @ is chosen. 


On first glimpse (6.2) looks quite difficult to calculate, but note that the matrix within 
the trace is only a Q; +Q2 dimensional matrix in total. Because V and W are block 
diagonal (V has d= 1,2, ...,D blocks each of size 2/7) simplifications can also be made 
within the matrix multiplications in the calculations. 


Now we need to calculate the second derivatives. For a@=1,2,12 and b= 1,2, 12, 


6* log|x'v 1x Si 
| | ey (ev x) xtyntg OM gi Vy 
OD, OP, OP, OY 
-1 
alx'’v tx 
Tr | xtytzg  gtyly 
OP, OPa 
-1 
=2Tr (x'v-'x) ba aw A ow 7! 1 ey eee ow Z'v'x 
OP, OP, 
24 -1 -1 
+Tr (x'v-'x) x ex) xtytg ™ giy-ly 
OP, OPy 
= 
=2Tr (x'v'x) x'vi'z a aa eae aw. Ag a ¢ 
OP, OP, 
zi 
Tr (x'v'x) xtytzg ‘vtXx(x! vx) xtytz YY giy-l |. 
OP, Og 
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Now let 


BA 


A dlog|x'v tx’ dlog|x'vV~'x| dlog|x'V~'x| 


Ss = 
Q a oy, Op, OP2 
and 

o? log|x'Vx| 6 log|x’v-'x| 6 log|x’v-lx 

2, OP,OP2 OP O92 
gest a? log|x'Vx| 2 log|x'v-tx| a log_x‘vx 

Bona BD 09,09, 0°, 092092 
a? log|x'V"x| 7 log|x'v-tx| a? log_x'v x 

OP,OP\2 0P,0P2 OQ 


where Sy and Jy are as defined in Section 5. 


To obtain a REML update of g given Pf and u we compute 


update __ revious *\-1 o* 
P a = g? -JS) So ’ 


where all terms on the right hand side are evaluated using the current values. 


ABS * SMALL AREA ESTIMATION USING A MULTINOMIAL LOGIT MIXED MODEL * 1351.0.55.029 


23 


7. EMPIRICAL BEST PREDICTION, SMALL AREA ESTIMATION 
AND A NOTE ON MSE ESTIMATION 


Now that we have derived the PQL and REML estimators of the model parameters, we 
would like to use these to help predict the small area totals of employed, unemployed 
and not in the labour force. These sets of totals are, for7=1,2,3 andd=1,2,...,D, 


it) I 
= r 
Say = DiVaiy + LV 
i=1 i=1 


where Vaij denotes the total non sample in small area d, age/sex group 7 and labour 
force class 7. Note that /= 10 (total number of age/sex classes within a small area). In 
previous sections, summations were defined using Jz and classes [7 +1, ...,/ were 
excluded because they did not contribute to the likelihood functions since there was 
no sample in these classes. However, we now need to include these. 


Obviously we cannot use dg directly because all the Vij are unknown. If we knew the 
values of # and uw then we could estimate 6g using 


x La I 
Say = DiVaig +L May 


7=1 7=1 
2 exaBy tua, 
May t t ify ma 
1+ exanBi tua + exa2PrtUar 
Z i; ex arb, +Ug2 
where Haij =) WG ea if7 =eZ 
ie exaihi tua ae exarPs +Ug2 
1 eas 
maj if7 = 3; 


1 exan hy ta, oy eX ai2BrtUa2 


and m’,, is the total non-sample in sex/age group /, in small area d and these are 
di Pp se group 


assumed known. Again 0g; cannot be used here directly since f and u are unknown. 


The prediction problem we now have is one where we need to predict 
I 
aa r 
la » Haiz» 
7=1 


for j= 1,2 (/ =3 is not needed since it can be obtained via subtraction since m7, is 


known). There are two ways to predict tg. The first is to simply replace uw with the 


PQL estimate # and replace f with the PQL estimate B. 
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I 
That is, tay = Map (7.1) 
7=1 


where the hat on “7; means that we are using estimates of 8 and wu within this 
function. This is the approach taken by Molina et al. (2007). An alternative way of 
predicting ty is to use an empirical best predictor (EBP) as defined by Jiang (2007, pp. 
143-144). Assuming that 6 and g are known, the best predictor in the sense of 
minimum MSE of ty would be 


E (rg \y) = [lta t (Ha |B. 9: ¥q) dug dug 


An empirical best predictor replaces B and g in the integral above with respectively the 
PQL and REML estimates B and @. The double integral above has no closed form 
solution but we could approximate it using Monte Carlo methods. 


Because we need to approximate the double integrals in EBP using Monte Carlo 
methods, this adds to the computation time and MSE estimation becomes a problem. 
For instance, bootstrap MSEs (and those based on other resampling methods) are not 
computationally feasible for EBP. Alternatively Jiang (2007) on page 144 describes a 
method of approximating the MSEs of EBP based on a Taylor Series expansion that 
gives an estimate whose bias is corrected to the second order. However in the 
derivation it is assumed that the estimates of B and g have certain properties which 
PQL estimators may not satisfy (See page 158). Therefore these MSE estimates may 
not be correct to second order in our context. Another problem with this MSE 
estimator is that one of its terms relates to the expected value of the EBP squared over 
the distribution of y and this calculation is not computationally feasible for our model. 


One advantage of using the estimator at (7.1) is that MSE estimation is relatively 
straight forward. Bootstrap MSEs are computationally feasible or alternatively, 
estimators based on Taylor series approximations can be used. In particular the 
Taylor series approximation method in Molina et al. (2007) could be extended to our 
category specific multinomial mixed model case. In any case it is not even clear 
whether the MSEs will be smaller for the EBP than those for (7.1). Therefore at this 
stage EBP is not recommended. 
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8. ANALYTICAL APPROXIMATION OF THE MSE 


For reasons discussed in the previous section we will be using the PQL estimators of 6 
and uw along with (7.1) to predict the small area totals. To approximate the MSEs of 
these small area total estimates, we use a similar approach to the one described in 
Molina et al. (2007) based on an analytical approximation. We begin by briefly 
outlining this approach and then later in this section we describe the technical details 
of the MSE approximation. 


As mentioned in Molina et a/. (2007), under linear mixed models, Prasad and Rao 
(1990) obtained an analytical approximation of the MSE of an estimator of the type 
t(D) = A!B+m!'a where 4 and mare vectors of constants and v and @ are respectively 
the vector of variance components and the estimated vector of variance components. 


The Prasad and Rao (1990) approximation takes the form 


MSE (t(0)) = G,(v) +G,(v) + G3(v) 


and the estimator is given by 


MSE (t(b)) = G,(b) + Gy(6) + 2G3(0) 


which corrects for the bias in the G;(@) term. In the derivation, Prasad and Rao used a 
result from Kackar and Harville (1984) who showed that under certain conditions 


MSE (t(b)) = MSE(t(v)) + E(¢(@) -t(v))). 


In the Prasad and Rao context 


MSE(t(v)) =G(v)+G,(v) and G3(v) = E(u) —W())*) 


and Prasad and Rao (1990) proposed a new approximation to Ga(v) relevant in a small 
area estimation context. 


Prasad and Rao’s formula was adapted by Baillo and Molina (2005) to a multivariate 
mixed linear model and a multidimensional parameter. Both the Prasad and Rao 
(1990) and the Baillo and Molina (2005) MSE estimators are for linear mixed models. 
These estimators can be adapted to our context by noting that our model can be 
written as an approximate bivariate linear mixed model (see Section 5 for further 
details). In our context we use the predictions (7.1) for the non-sample labour force 
counts and in order to apply the Prasad and Rao (1990) and Baillo and Molina (2005) 
MSE approximations, we need to linearise (7.1) using a first order Taylor series 
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approximation (Molina et al. (2007) use a similar approach). This then results in a 
MSE approximation of the form G(g) + G2(g) + G3(g). There is also an additional 
term needed called G4(g) which is added to the above MSE approximation. This extra 
term comes from the fact that we are estimating the actual non-sample counts ¥; Vij 
and not simply t7Which introduces additional variability (this extra term is also found 
in the MSE approximation in Molina et al. (2007)). 


Note that the multivariate linear mixed model in Baillo and Molina (2005) is different 
from our approximate multivariate linear mixed model. In Baillo and Molina (2005) 
there is only one random effect associated with each small area and we have two 
correlated random effects in each small area. Therefore the formulas in Baillo and 
Molina (2005) cannot be immediately applied to our case. Nonetheless we are still 
able to produce a MSE estimator but note that it cannot be guaranteed to be accurate 
to a known order even if the multivariate linear mixed model was not an 
approximation (a more detailed and rigorous proof along similar lines to pages 7-17 
in Baillo and Molina (2005) would be needed for that). In any case, we show in later 
sections that our MSE estimator performs well. We now derive this MSE estimator. 


For d=1,2,...,D, let dg =(6a1,6a2)’ be the vector of small area totals that we are 
interested in predicting (note that the third total within each small area can be 
obtained via subtraction). Now 


1 I l 
54 = Dat Mai +> (vn - Hi), 


i=1 i=1 i=1 


where yj and 7), are respectively the vectors of sample totals (known) and 
non-sample totals (unknown) and 


t 
Mii = Mui (Pai: Pair) ? (8. 1) 


where m/,, is the total non-sample which is assumed known. 


Now define 
: 1 I 1 
r 
6g = YD vai $0 ali = YD yvai +Tq 
i=l i=l f1 
7 1 1 rg 
and 64 => Vai t+ > Ma = atta» (8.2) 
i=1 i=1 7=1 


where Bayi is (8.1) with the w’s and f’s in pg and paz replaced with the PQL 


estimates # and f. 
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The estimator 6, is used to estimate 6. We will define the MSE matrix of 6, as 
use (8,)= #{(61-81)(8a-4a) | 

= (1-8y +54 -64)(bu 5a +4484) | 

= 6 (6. -6,)(6, 51) + ( (Bi - 54) (54 -8,) } (8.3) 
The MSE matrix contains MSE (6; ) and MSE(6,) and the cross product term 

#((5n.-80\(bn.-6a2) 
Note that the terms 
#((1-81)(8-4) | 

ana #{ (Su -8a)(6a-4a) | 


are omitted from (8.3) because these are matrices containing all zeros. This is 


because given ug, dg—6g and bd —6dq are independent. 


neue #( (Bu ~8a)(6a-8a)'}= #{ #((B ea) #{ (5a 8e) 
and note that B((é, -54)|ta) =O. 


By a similar argument, 


#{ (ba ~5,)(5u 8a) | Sree 


#| (6a -&4)(51-8a) }= (fa 2a) (Ea ~ta)!)= MSE (Ea) 


28 ABS ¢ SMALL AREA ESTIMATION USING A MULTINOMIAL LOGIT MIXED MODEL * 1351.0.55.029 


#{ (6, ~8,)(5a-84) \=E [S(08- Ha) | Blo -1i)) 
=E\E S(04- -1in)| Eva-Hi)) Ud 
= zp a ; (8.4) 


Pai (1 — Pai ) —PainPai2 
—PainPai2 Pai2 (1 — Pai2 ) 


where ey = Ny; 


The term E(2/y x) cannot be further simplified because 27, is a non-linear function 
of the random effects and the expectation involves an integral with no closed form 
solution. To estimate this term we can use (as in Molina et al. (2007)) 


I 
GAGs 2s 


i=1 


where ee is &7,, with the z’s and fs within replaced by the PQL estimates # and B. 


Now we need to further simplify and approximate the matrix MSE(¢,). 
‘e eae Aa 2 I nw Y 
By definition, Tq = ee. Hai 
and each of the fi/,, can be written as functions of 
A A A t 
O44 = (G11 , 6,2) 
where 6; =X, P+Zyn 


We can now calculate a first order Taylor series approximation for each of the vectors 
fii, about the point 84;. This approximation is 


Hay ® Mai + Dai (8, ~ Oui 
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and hence 


I I 
A TA r 
bq -Tq ® Ye O a: — oe Oui 
7=1 7=1 


i : I 
= Hi (Xu B+ Zot) a (XaiB + Zait) 


7=1 7=1 
_oasar ’ 
=a gs 
where tT =KygB+Myu 
and ty =KyB+M yu, 
I 
where Kg = Das Lai X ai 
—~VW 
and My =>) aiZ ai ‘ 


Technically Kz and Mg are random variables since Xg; is dependent on wg. However, 
from this point forward we need to assume that Kg and Mg are constant and do not 
depend on wg. Note that this will be a reasonable assumption if the variance 
components are small. Now, using the fact that ?y-tg* 7) - t!,, it follows that 
MSE(Z qa) ® MSE(#')) (Molina et al. (2007) also make this assumption) and therefore 


MSE (4) ~E((@4-t4)(4-t4) | 


= E(t i +8 -14)(@-t +2 -14)), (8.5) 
where #,=K B+M ait 


and f and # are the estimates of 6 and u assuming that the variance components 91, 
g2 and 42 are known. 
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As noted in Section 5, the PQL estimators can be obtained via equations (5.1) and 
(5.2). So we can write 


B= (x'vx) xy, 
u=Wz'v"'(é-xf), 
P2208) ae (8.6) 
and w=wz'v '(é-xB), (8.7) 


where V=ZWZ!' +d71 and V=ZW2Z' +& 7". Note that Wis W but with 91, g2 and 12 
within replaced with either the approximated ML or REML estimates. Also, > is just 
=! but with B and w within replaced with the PQL estimates (which are also 
technically functions of @,, @, and @,,). 


Baillo and Molina (2005) apply a result which was derived by Kackar and Harville 
(1984) in a linear mixed model setting. We will also apply this result and assume that 
the estimator of the variance components is translation invariant. This assumption 
means that we can further simplify (8.5) to 


MSE (#4) = E((#y-#)(#4-4)' )+E((Eq—t4)(@4-70)'| 
= B((éy -#,)(@, -#,)' )+ MSE (#). (8.8) 


To approximate the first term in (8.8) we note that 7,, = %(@) and %, = %,(g), where 
9 = (91,92,912)'. This only works if we assume that $4; is not a function of @ 
(technically £, is a function of @ and B which are themselves functions of @). With 
these assumptions we can now approximate 2 ~ty using a first order Taylor series 
expansion like in Kackar and Harville (1984). Here though we are in a bivariate 
setting. 


First let %y =(t1, tp)’, then 
Oty OT, OT 
oy, OP, OP2 


Bin diy, dtp 
OP, O92 OP2 
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This then implies that 


Rie capt ropen 2 §11 812 
z( C= aN St, ‘| mz o| } 
( d a) d a) Si Je 


where for7=1,2 andk=1,2, 


{S Cin , Otay Ot 
OP, OP, OP, OY 


($2 Ob up _ Otay OF tp (0 -%)(P2-A2) 


OP, OP2 OCP OY, 


OT, OF! Oty, OF" 

Gi OT dp Gi CTar |i» n 
+ a (0. -92)(Pi2-P2).- 
(= OP2 OP OP? 


Now let for 7 = 1, 2, 


“a (Fa Oty va) 
Op OG, O92 Oz) 


then for7=1,2 andk=1,2, 


: tay OT up ; ee 
Now assuming that the elements of each of eee tee , for each combination 
Q Q 
of j and k, are approximately independent of the elements of (@— g)(@-)', then 
Ar ~ Ar ~r \b gy 21 
B((@ -4)(é4-%) -| <2 } (8.9) 
§12 822 


where for 7= 1,2 andk=1,2, 


bn =} (S| Si) |a(6-ene-e¥)] 
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This idea is motivated from Kackar and Harville (1984) and is also applied in Prasad 
and Rao (1990) and Baillo and Molina (2005). 


The term £((6-9)(¢-0) } 


can be approximated by using the inverse of the approximated observed Fisher 
information matrix. That is, if approximated ML is being used for the variance 
components then we can use the final value of -Jg! from Section 5. Or if 
approximated REML is being used for the variance components then we can use the 
final value of ~Jg' from Section 6. 


OF Wt Oey Y. 
As for the other terms E y \ Pdk 


for j= 1,2 and k= 1,2, these can be simplified further by applying some ideas from 
Prasad and Rao (1990) and Baillo and Molina (2005) with some adjustments. We do 
this now. 


Note that * can be written as follows 


ee Ea Ky p+ Mau 
Tq — e — ee 7 ; 
Ta2 Ka2B + Mg2tt 
where Kg and Kg are respectively the first and second rows of the matrix Kz. 
Similarly, M7, and M2 are respectively the first and second rows of the matrix Mg. 


For j = 1,2 we have 
ty =KyB+M git 
=Ky(x'vx) x'vé+Mywz'v(E-x8) 
=Ky Faarce X'v 1é+ MyWz'v [e-x(x'vox)" xv). 


Now substitute =X B+Zu-+e into the above equation and after some algebra and 
some simplifications we obtain 


tay =KyB 


[ka (x'vx) X'V 1 + My WZ! [v7 aaa de xv) a +e). 
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Now for a= 1,2 and 12 and/=1, 2 we have 


-1 
Ks (x'v-'x) X'v 14 


az", 
=! _— a (Zute) 
mes Mgw2! (v9 -vx(x'v4x) xy] 

o(Mywz'v' 

© | 2 Peay (8.10) 


The justification for the above approximation can be roughly explained as follows. 
The terms that we are ignoring contain the term (X‘V~!X)~!. This term is expected to 
be small and negligible if the number of small areas D is large (which is often the case 
in practise) compared to the number of parameters in B. This is because (X’V~LX)~! is 


the approximate variance of B which gets smaller as the sample size increases. 


Using the approximation (8.10) we can now write for7=1,2 and k=1,2, 
ay ENT: 
E ) Td re) Td 
0p j\ op 


#((Zu+e)(Zu+e)] 


6(Mq.Wz'v—') 


Q 


0p 


(8.11) 


a(mywz'v') |/o(Mywz'v-') a(m4wz'v-) a(mywz'v-) ry 


» 


OP, OP2 


To approximate each of the matrices given by (8.11) for7=1,2 and k = 1, 2, we simply 
calculate the derivatives and then replace the unknown parameters by their estimated 
values. To calculate the elements within the matrices we can use the following 
simplifications which we obtained after some algebra. 
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For j=1,2 andk=1,2anda=1,2,12 andb=1,2,12 


t 
ty,-l ty,-l 
a(Mqwz'v") X 6(Mq.WZ'V-') = gtytg © at, 
v9 é 
Pa Pp 


0 Pa 0 Pp 


é é Z E 
-My Mey ig Moy: "ZW Mi -— MgWZ'V'Z——Z'V'Z— My 
OP, OP, OP OPp 
+ MyWZ'VZ— OW pig ig Megige ZW My. 
OP ” 30. 


(8.12) 


So we can now fully approximate B((éy -t \(4 ao \ . 


Next we need to approximate the matrix MSE(E,)). To do this we follow almost 
directly the approach taken in Molina et a/. (2007) and after some algebra we obtain 
terms very similar to those obtained in that paper. 


By definition 
Z)-t!) =K Bh +M t—KjB-M yu. (8.13) 
I 
Let fT=v7 -vx(x'v-tx) x'v"! 
fer] -1 
and P =(x V x) 


and substitute (8.6) and (8.7) and €=Xf+Zu +e into (8.13). We then obtain 
@y— ty =(KaPX'V' + M,WZ'TI)(Zu+e)-Myu 
and hence 
(f — ty )(Ea-ta) 
=(KjPX'V"! +MgW2Z'Tl)(Zu+e)(Zu+e) (V"'XPK) +11ZW Mj) 
+ Muu! Mi, ~ Mqu(Zu+e) (V'XPK) +11ZW M4) 


-(KyPX'V'+M,WZ'T)(Zu+e)u! My. (8.14) 
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Now #((Zu+e)(Zu+ey' J=v, 


and so after the above substitutions and with some further simplifications we have 
B((é% —t1,)(ty —ty y) =K,PKi-KyPX'V 'ZWM;, 


-M,WZ'V'XPK!)+M,WM)-M,WZ'TIZWM). (8.15) 


-1 t ae 
Now let T =(w 4Z zz) (8.16) 


and we will need the following identity from Henderson and Searle (1981) (for 
example) 


V2 9777S (8.17) 


After some algebra it can be proved that 


V ZW =DZT (8.18) 


and Wo STZ (8.19) 
Using (8.16)-(8.19) we can further simplify (8.15) to 
~ '\(70 rye) _ t t t t t 
B((é —t (ty —ti,) |= MIM y +KjPKy +M {TZ uXPX 2ZTM 
—K ,PX'XZTM!, — M jTZ'=XPKY 
which on collecting terms is 
t 
B((% 2 )(@; -14)')= MygTMy +(Ky-MgTZ'EX)P(Ky-M,TZ'EX) . 


To approximate the above equation we simply substitute in the estimates of f, uw and 9. 
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We now have a formula to approximate the matrices MSE(@ 4). This approximation is 


MSE(b4) = Gy (@)+Gz(@) + 2G; ()+G, (¢), (8.20) 
where G,(9)=MgTM 4 , 


t 
’ 


G2(@) =(Ra — MgfZ'3X) P( Ry - Mgiz'Sx) 


and G2(@) is given by (8.9) with estimates substituted in and with some further 
simplifications such as (8.12). Note that G3(@) is multiplied by two in the above 
approximation. This is because as in Molina et al. (2007), E(G1(@)) © Gi(g) — G3(9) 
and we therefore multiply G3(@) by two to correct for the bias. The proof that 


E(G, (?)) =G (9) —G, (9) is given in the Appendix. 
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9. OUT-OF-SAMPLE SMALL AREAS 


All previous sections implicitly assumed that for d= 1,2,...,., each small area had 
some sample in at least one of its age/sex classes. Now suppose that as before we 
have d=1,2,...,D in-sample small areas, but we now have an additional set of R small 
areas with no sample. We can still produce estimates and MSEs for these 
out-of-sample small areas, but a slightly different methodology needs to be applied. 


In total we have d=1,2,...,D,D+1,...,D+R small areas that we wish to produce 
estimates and MSEs for. For the d=1,2,...,D small areas in sample we simply follow 
the methodology in the previous sections (discard the out-of-sample small areas for 
estimation of parameters, small area totals and MSEs). The rest of this section 
discusses estimation for the out-of-sample small areas. 


Given our model, the best prediction we have of the out-of-sample small area random 
effects (ugi,Ug2)! ford =D+1,D+2,...,D+R is (0,0)’. This is because in our model 
we are assuming that the random effects are independent between small areas and 
knowing what the observed in-sample data are does not give us any additional 
information about the out-of-sample w’s. If we had a model with spatially correlated 
random effects, then we could adapt the method outlined in Saei and Chambers 
(2005) to produce estimates and MSEs. But of course, our model does not have 
spatially correlated random effects. 


We therefore need to resort to producing synthetic estimates for the out-of-sample 
small areas. These are ford=D+1,D+2,...,D+R, 


where 
exaih 
t A t A syn 
set | ee 1+ eranhi + ex ark a Sd Dx 
Ha = "ai " = Mg; ? 
x! oB syn 
e di2F2 PH? 


1+ exaih of exarb, 


where the estimates B = (B : By are the POL estimates based on the in-sample data 
(al prea 


Most of the derivation for an approximation to the MSE matrices for the out-of-sample 
small area totals proceeds in a similar way as in the previous section. 
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First define 


exab 
t t oS 
re : 14+ era + er aiPr 7 Pai 
Ha = Na 7 = Moy *k 
exanPr Pa 
14+ exah +4 ex arb 
I I . 
s xp r r r “yr 
and let O74 = > Ha + Xv di ~ Hai + Hai — Hai 


i=1 i=1 


denote the real totals we are trying to estimate. 


i 
Now define 6g => Ma =Ta- 
i=l 


The MSE matrix of bg ford=D+1,D+2,...,D+R can now be written as 
sé (b,)= #{(8a~u)(8a-4u) }+2{ (5 ~20)(Bu 82) | 
+ B((6s-8i)(5-4a) +8 ( (Bu -81)(6a-u)'} 0.1) 


We now go about further simplifying the terms in (9.1). 


Firstly we note that 


I I I | ee 
E [Si Yo | Dp — > hi 
i=l i=l i=l i=l 


=E [a(S ta \ Sela | Zo Sui | =0 


#((6s-4,)(5r—22) )=2 baz Soa | Dv Dl } 


I le I ie i 
+E [Soir Dat | Sa Do (9.2) 
7=1 7=1 i=1 7=1 
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The first term in the above equation is simply (8.4) and to approximate this term we 
can use (assuming the variance components are small) 


u Ak 

Ry 
pee 
7=1 


syn syn syn psyn 
Shins te ea ad (1 ~ Pain Pai Pai 
where Lai = Mai oa we ee Sak (2.5) 
Pai Pair Pair (1 ~ Pai2 


The second term in (9.2) can be approximated by taking a first order Taylor series 
expansion. Let 0=XyfP+ug and 6* =X if. Then 


Har ® Ha; + 2s (8-8 ), 


= Hj +g 0.4) 


where 27, has the same form as (9.3) but with Dy and Doe replaced respectively with 
Pay and p72. Hence, 


f f * Z f * : * * 
E Sa Sli | Da Sis e My E(uquly) Mg 
i=1 i=1 i=1 i=1 


=M,W,M), (9.5) 
where M,= an 


Note that (9.5) can be approximated with M iW 4M oe 


where M d + = Mi 
and Wie= oe oe } 
Piz “Po 


Now we need to approximate 


#{(6a-84)(51-4a) |= MSE (Ea) 
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Let ibe =XyiB and 67, =Xaif. By a first order Taylor series expansion we have 


Bai * Hai + 2a (8s ~ Our} 0.6) 
and hence 
i ; 
Tq —Ta ~ 24 (Xab-XaB). (0.7) 


i=1 
N l I dt.= * FB h Ky=> wee 
ow let t7 =KZ Band t 7 =K7B, where Kq = 2 ,_,2aiXai. Now, 


MSE (é,)* MSE(ti;) 


= E(t -#4 +4 —10)(ta —F4 + 84-74) | 


= E ((@ —t4)(e 24)! )+ 2 ( (ey 24) (24 -F4)'} 0.8) 
where tT = KiB 


and to get to the last line we assume as in the previous section that the estimator of 


the variance components is translation invariant. 


The second term in (9.8) is 
b(t) Gta) = £[ (BRU) KaB-KGA) |=0. 


To see why the above term is approximately the 0 matrix, see the argument just after 
(8.10) and note that we do not have an Mg type component here. 


The first term in (9.8) is 
B (fata) ata) )=2| (KiB -KiB)(KaB-KiB) 
where B= pas) ya ame 


and X and V are as defined in the previous sections and are therefore based on the 
in-sample areas only. Again we need to assume that 27! in V does not depend on the 
in-sample random effects u to do the following. 
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After some simplifications we have 
* % . i fe re eae DOE 
Ki B-KjB=Kq(X'V'X) x'v"(Zu+e), 


where Z, u and e are defined using the in-sample areas. 


Hence we have 
B((@ ru) @a 24) } 

Sia | x'v'e((Zu+e)(Zu +e) \ye yx)” Ke 

= K’, (ven) ya ae (eae <i K:! 

= K’, (x'v-'x) KI ‘aap 
and we can approximate (9.9) with 

re Cages ae 

where pz 


and V is just V with the estimates @, B and @ substituted in as in previous sections. 


Now we need to consider the terms 


aad # (5, -54)(84 -i,) ). 


Note that bd 267 is a function of the in-sample data and is independent of Od —dg 
which is a function of the out-of-sample data. Therefore 


#{ (4 =55)(5; -8,)') =8 (8, ~5,))#( (6. -8,)') 
aad #((6, oe -5,)' =F (5, 81) { (8. _é, i 
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Now using (9.6) and (9.7) we have 


E (5g -3,) x yEy (Xu#(B)-XuB) = O24 


i=l 
since f is unbiased in a linear mixed model framework. 


Also using (9.4) 


= I ie 
E(dq-64)= ee D(x - Ha Hao) 


(a. — alae + Hen — He ) 7) 
i=1 


ty 
eae NN 
Mis 


Hence #{ (bu ~8u)(&u 84) |* 020 


and #((1-81)(6a-8,) )=220: 


We can now define an MSE estimator for the out-of-sample areas, 
ad=D+1,D+2,...,.D+Ras 


ao f N Ak A TT Re Ak A Ae u A x 
MSE(54)=Ra(X'V 7x) Ra + MW MEE SOE 


7=1 
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10. APPROXIMATE STANDARD ERRORS OF THE ELEMENTS OF £ 


In Section 8 we derived the approximate MSE matrix of an estimator of the form 


KB+Miu, where M and K are given matrices. 


If we set K to be the identity matrix and M to be a matrix containing all 0’s then we can 
apply (8.20) directly to obtain 


=( ‘vox) (10.1) 


by noting that the component G4(@) is not relevant here. The approximation (10.1) 
can be used to check the significance of the covariates in the model. Note that it is 
not appropriate to use Sp defined in Section 4 for this purpose (since it is not a 
function of W and does not account properly for the influence of the random effects). 
Note also that in this framework (of linear mixed models), B is unbiased, and so the 
approximate standard errors of the elements of B are the square root of the diagonal 
entries in (10.1). 
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11. PARAMETRIC BOOTSTRAP MEAN SQUARED ERRORS 


The mean squared error estimators in Sections 8 and 9 were derived using various 


approximations. In this section we consider an alternative mean squared error 


estimation technique based on a parametric bootstrap. This approach is very similar 


to the bootstrap method described in Molina et al. (2007). The main difference here 


is that instead of simulating wg from a univariate normal distribution, we need to 


simulate from a bivariate normal distribution. We also extend this bootstrap 


technique to incorporate out-of-sample small areas. 


The bootstrap method is outlined as follows: 


(a) 


(b) 


(c) 


Model fitting: fit the model to the original data (this will be for in-sample areas 
d=1,2,...,D), obtaining parameter estimates By and B and @. 
Generation of random effects: For d=1,2,...,D,D+1,D+2,...,D+R (includes 


R out-of-sample areas), independently generate u7, =(u%,,u%,)' from a bivariate 


normal distribution with mean 0 and covariance matrix 


Generation of a bootstrap population: ford =1,2,...,D,D+1,D+2,...,D+R, 
calculate the probabilities 


t QA * 
exh tar 


Pai = —F 4 ah bu 
1+ exan By tua +4 ex airy tan 
and 
. ek airBy tao 
Pai2z = 


t A * t a * 
1+ exaihi tla + exanrB2 tae 


and generate the following sample and non-sample multinomial vectors 
Dae (sin Jai2) ae Multinomial(¢, Pant Paiz)> 


ro a a ae t . : r * * 
Vai = (vif. Vai) ee Multinomial (mf, Pats Pair ). 
Calculate the true area totals 


5; = (11822) = D(vat 9) 
7=1 
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(d) Model fitting to the bootstrap sample and parameter estimation: fit model to the 


(€) 


(f) 
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x aX 


bootstrap sample data a for d=1,2,...,D only, obtaining estimates B is B 7) 
and predicted values Z7. Note that the pees values of wg for the 
out-of-sample areas oe 1,D+2,...,D+R are always 7, =0. From these, 
calculate individual predicted values ford =1,2,...,D+R: 


t Ae AK t Ak At 
Xa +U, Xai +U, 
Ate e aniPA dl e di2P2 d2 


Vq =My| ———— > So SS 
di di t DY. at t S* ax? t D*.aA® t D*.a*® 
14 exaih tUg) 4 eX ai2Pr +Ug2 14 erah +Ugy 4 er ah +Ug2 


Then calculate bootstrap estimates of totals by 


” vo. fy axes {40 
84 =(8ar2ar) = L(yai + 3a). 
j= 
Bootstrap replicates: repeat steps (b)—(d) B times. Let ow) and ow) denote the 
true values of the parameters and é *0) and 6 ae the estimators that are 
obtained in the b-th repetition, b=1,2,...,B. The bootstrap estimators of the 
mean squared error matrices E((6a ~da)(a ~64)'), ford=1,2,...,.D+Rare 


#{(6a-84\(80-64) |= 


B B 
oY (55 SO) ~5,Y wy (55 SO _ 54)(555 () -5) 


B 
-1 $6) SO)! SO _ sO) pA (SO _s*@ 
B > (54 ag (3 - 5) B > (si om By 


b=1 
Other estimates: after step (e) it is also possible to estimate mean squared errors 
of other parameter estimates such as B and @. For example, after implementing 
the repetitions b = 1,2,...,B we can also approximate root relative mean squared 
errors of each element in B , and Bp. Let index k denote the k-th element of f1 
(or Bz). Let Bw denote the value from step (a) and B te denote the bootstrap 


estimate at iteration b. A bootstrap estimate of the RMSE(B 1(R)) is 


B 2 
= 5*(b) a 
(e ¥ (Ai) Aw) 
=] 


and this can be compared directly with the standard error estimate from 
Section 10. 
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12. AUXILIARY DATA 


The previous sections described the theory behind producing small area estimates of 
labour force counts and estimating mean squared errors. We now apply this theory to 
real data. 


The small areas in our model represent LGAs (Local Government Areas) which have 
boundaries defined according to the 2001 ASGC (Australian Standard Geographical 
Classification). These are chosen because they provide sufficiently fine geographical 
areas, but still have adequate samples in the area for analysis. The total number of 
in-sample areas in August 2001 and August 2006 are respectively 424 and 413 and the 
total number of out-of-sample small areas in August 2001 and August 2006 are 


respectively 220 and 231. 


The first step is to fit an appropriate model to the in-sample data and produce PQL 
estimates of wz and Band approximated REML or ML estimates of g. One essential part 
of this first step is to choose an appropriate set of explanatory variables to include in 
the models. That is, we need to fully define xg and xg. 


Auxiliary data are available from a variety of different sources. The two main sources 
are administrative Centrelink benefit payment data from DEEWR (Department of 
Education, Employment and Workplace Relations) and the Australian Census of 
Population and Housing. These data are available for the time points August 2001 and 
August 2006 and so it will be possible to fit two separate models with the same 


explanatory variables for the two different time points as a comparison. 


There are quite a large number of potential explanatory variables that could be used in 
the models. Ideally some kind of model variables selection process would need to be 
undertaken, however we do not consider this approach here. Instead we note that 
the ABS has already produced experimental estimates of small area labour force 
counts using three separate binomial logit mixed effects models (using the 
methodology outlined in Saei and Chambers, 2003). A careful model selection 
process was applied in this case to determine an appropriate set of explanatory 
variables for the binomial logit mixed models. For consistency and comparative 
purposes we will use the same set of explanatory variables that were used in these 
earlier models. Therefore each of our vectors xg; and xgj2 will contain the same set 
of 37 variables each. In summary, these variables are benefits payments variables, 
state indicators, age/sex indicators, remoteness indicators, socio-economic indexes for 


areas, and household type. 
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The following list outlines the explanatory variables in more detail. 
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AS1—AS10: ten age/sex indicators. The five age groupings are 15-24 years, 25-34 
years, 35-44 years, 45-54 years and 55—64 years. Note that we only consider the 
working age population for this analysis. AS1—AS10 are defined in order of 
increasing age and all odd AS groups are males. The base category is AS1. 


STATE1-STATES: eight state indicators. These refer in order to New South 
Wales, Victoria, Queensland, South Australia, Western Australia, Tasmania, the 
Northern Territory and the Australian Capital Territory. The base category is 
New South Wales. 


NSA_YAO: Proportion of population in the class registered to receive full 
payment of Newstart Allowance (unemployment benefits) or Youth Allowance 
(other). 


ASPAY1—ASPAY10: ten age/sex indicator by PAY interactions. PAY is the 
proportion of the population in the class registered to receive full other benefit 
payments. For example, disability support pension, parenting payments, partner 
allowance, wife pension, etc.. Note that this set up implies that the effect that 
PAY has on the probability is different for each age/sex class. 


REMOTE1-—REMOTE23: three remoteness indicators. These refer to major city 
(REMOTE1), non-remote area (REMOTE2) and remote area (REMOTE3) as per 
the ASGC (Australian Standard Geographical Classification) 2001 (see ABS (2001) 
for further details). The base category is REMOTE1. 


SEIFA1—-SEIFA4: four socio-economic index of advantage-disadvantage 
indicators. SEIFA1 indicates advantaged areas (whether the area is in the top 
25% of SEIFA scores), SEIFA2 indicates the next 25%, SEIFA3 the next 25% and 
SEIFA4 indicates the most disadvantaged areas (whether the area is in the 
bottom 25% of SEIFA scores). The base category is SEIFA1. For further details 
see ABS (2003). 


HH1: Proportion of Census population in class that lives in dwelling consisting of 
married couple only or married couple with at least one child aged 15 or over. 


HH2: Proportion of Census population in class that lives in dwelling consisting of 
married couple with children all aged 0 to 14 years. 


HH3: Proportion of Census population in class that lives in dwelling consisting of 
one person only or one person with at least one child aged 15 or over. 


HH4: Proportion of Census population in class that lives in dwelling consisting of 
one person with children all aged 0 to 14 years. 
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Before we fit the multinomial logit mixed models, we undertake a brief exploratory 
data analysis using the August 2006 LFS and Census data to examine the suitability of 
our above chosen explanatory variables. Figure 12.1 contains average proportions of 
employed and unemployed within the ten AS groups, the eight STATE groups, the 
three REMOTE groups and the four SEIFA groups. The labour force proportions 
(especially employment) depend strongly on age and sex since the mean proportions 
vary across these categories. The AS indicators should certainly be considered as 
explanatory variables. Figure 12.1 also shows that the relationships between the 
labour force average proportions and each of STATE, SEIFA and REMOTE are in 
general not as strong as age/sex but there is still some variation. 


12.1 Plot of average labour force proportions within certain groups for 2006 
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The variables we have examined so far are the categorical variables. We now examine 
relationships between log(ygj1/yq3) for the cases yqjz > 0 and yg > 0 and the 
continuous explanatory variables for 2006. Note that we do not do this for ygjz since 
there are a large number of zeros in this case. 


Figure 12.2 contains plots of logQvgi/yaiz) versus each of HH1, HH2, HH3 and HH4. 
From these it appears there might be a weak association, especially for HH1 and HH3. 
A similar plot for the variable NSA_YAO is given in figure 12.3, although we note that 
the linear relationship in this case does not look very strong. However, note that 
irrespective of this observation, a relationship may still exist between log(y qj2/y a3) 
and NSA_YAO and this is the reason why this variable is still included. 
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12.2 Plot of log(yai1/yai3) versus each of HH1, HH2, HH3 and HH4 for 2006 
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12.3 Plot of log(yai1/yaiz) versus NSA_YAO for 2006 
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12.4 Plot of log(ygi1/Vaiz) versus PAY within AS groups for 2006 
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Figure 12.4 contains a plot of log(ygj1/yaiz) versus PAY within AS groups. There 
appears to be a negative relationship between log(vq1/a3,) and PAY. However, note 
that the slopes appear to differ between AS groups. This is the reason why we include 
the PAY variable in the model as an interaction between each of the AS indicators. 
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13. ESTIMATES OF MODEL PARAMETERS 


Tables 13.1 and 13.2 contain the PQL estimates of B; and £2 for the years 2006 and 
2001. Estimates of the standard errors of the parameter estimates and p-values are 
also given in these tables. The p-values are calculated assuming that the distribution 
of the parameter estimate is approximately normal. Estimates which are significant at 
the 0.05 level are marked with a *. Not all parameter estimates appear to be 
significant, but as we stated in the previous section, all variables are still included for 
consistency with the previously fitted binomial logit mixed models. As expected, for 
employment (and this holds for both years), the AS group indicators and ASPAY 
variables are highly significant. For both years, the variable NSA_YAO is highly 
significant for unemployment. This is what we would expect as NSA_YAO is related to 
unemployment. 


The standard errors in tables 13.1 and 13.2 use the analytical approximation (10.1) 
which has many assumptions behind it. As a comparison, we also calculate parametric 
bootstrap RMSEs using the algorithm described in Section 11 (in particular see step 
(f)). We implement the parametric bootstrap using the 2006 data, with the simulation 
size set to B= 1000 and use the combined PQL-REML algorithm for parameter 
estimation. A comparison between the 74 estimated analytical standard errors and the 
parametric bootstrap estimated root mean squared errors is given in figure 13.3 
(figure 13.4 contains the smaller standard errors in the range 0-0.5 only). From these 
plots we can see that the differences between the estimated SEs and RMSEs are very 
small. Also in all cases we find that the bias component of the bootstrap RMSEs is 
small (<1.5%). Therefore the analytical SE estimates of B appear to be good 
approximations. 


The PQL parameter estimates in tables 13.1 and 13.2 use the approximated REML 
method to estimate the variance components g. As an alternative to this we could 
also have used the approximated ML method. Table 13.5 contains estimates of the 
variance components for both August 2001 and August 2006, using the two different 
estimation methods. To estimate the RRMSEs of the approximated ML and REML 
estimators, the parametric bootstrap of Section 11 (with B = 1000) can be used with 
some slight modifications that we now briefly discuss: 


° In step (a) of the algorithm, compute the PQL-REML estimates. These will be 
conditioned on in the rest of the simulation 


° In step (d) we need to compute both the REML and ML estimates of the variance 
components. 


: At the conclusion of the simulation we have 1000 REML and 1000 ML estimates 
of g. These can then be used to calculate estimates of bias and RMSEs. 
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13.1 PQL Estimates of 8; and {2 for 2006 (REML used for variance components) 
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-5.430 1.210 -4.50 < 0.0001 
—4.060 0.843 -4.81 < 0.0001 
17.100 2.720 -6.30 < 0.0001 
—-4.510 0.739 -6.10 < 0.0001 
16.300 1.740 -9.39 < 0.0001 
-4.790 0.735 -6.52 < 0.0001 
10.500 1.330 -7.94 < 0.0001 
-5.410 0.844 -6.41 < 0.0001 
6.390 0.638 -10.00 < 0.0001 
-4.720 0.566 -8.34 < 0.0001 


1.510 0.362 4.18 < 0.0001 
1.090 0.924 1.18 0.2380 
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13.2 PQL Estimates of 8; and #2 for 2001 (REML used for variance components) 


Employed Unemployed 
Variable Estimate SE Z p-value Estimate SE Z p-value 
Intercept 0.779 0.204 3.82 0.0001 * -2.610 0.370 -7.05 < 0.0001 * 
STATE2 0.024 0.045 0.53 0.5980 0.122 0.084 1.45 0.1470 
STATES 0.134 0.048 2.74 0.0061 * 0.330 0.086 3.83 0.0001 * 
STATE4 0.045 0.056 0.81 0.4170 0.119 0.104 1.15 0.2500 
STATES 0.039 0.052 0.75 0.4520 0.146 0.095 1.53 0.1260 
STATEG -0.018 0.071 -0.25 0.8020 0.220 0.124 1.77 0.0767 
STATE7 0.075 0.124 0.61 0.5450 0.135 0.219 0.62 0.5370 
STATES 0.145 0.096 1.51 0.1310 0.101 0.177 0.57 0.5670 
AS2 -0.089 0.154 -0.59 0.5540 -0.148 0.267 -0.55 0.5800 
AS3 1.960 0.182 10.80 < 0.0001 * 1.650 0.287 5.77 < 0.0001 * 
AS4 0.781 0.147 5.31 < 0.0001 * 0.200 0.282 0.71 0.4770 
ASS 2.460 0.200 12.30 < 0.0001 * 1.800 0.344 5.24 < 0.0001 * 
ASG 0.715 0.180 3.98 < 0.0001 * -0.133 0.358 -0.37 0.7110 
AST 1.720 0.154 11.40 < 0.0001 * 0.966 0.265 3.64 0.0003 * 
AS8 0.415 0.135 3.08 0.0021 * -0.767 0.296 -2.59 0.0096 * 
AS9 0.080 0.143 0.56 0.5770 -0.795 0.315 -2.52 0.0117 * 
AS10 -0.766 0.154 -5.09 < 0.0001 * 2.680 0.506 -5.30 < 0.0001 * 
REMOTE2 0.036 0.058 0.62 0.5330 0.017 0.104 0.17 0.8690 
REMOTES 0.138 0.042 3.27 0.0011 * 0.077 0.075 1.02 0.3080 
SEIFA2 -0.053 0.046 -1.15 0.2500 0.039 0.083 0.48 0.6350 
SEIFA3 —0.069 0.060 -1.16 0.2460 —0.083 0.108 -0.77 0.4400 
SEIFA4 -0.084 0.068 -1.24 0.2150 0.141 0.122 -1.15 0.2500 
ASPAY1 -3.830 1.010 -3.79 0.0002 * 0.201 1.650 0.12 0.9030 
ASPAY2 -2.370 0.746 -3.18 0.0015 * 0.658 1.340 0.49 0.6230 
ASPAY3 -9.230 2.630 -3.51 0.0004 * -7.110 3.780 -1.88 0.0601 
ASPAY4 -4.730 0.819 -5.78 < 0.0001 * -1.030 1.680 -0.61 0.5410 
ASPAY5 —13.800 1.770 -7.78 < 0.0001 * -9.390 2.850 -3.29 0.0010 * 
ASPAY6 —4.400 0.746 -5.90 < 0.0001 * -0.604 1.680 -0.36 0.7200 
ASPAY7 —11.200 1.300 -8.62 < 0.0001 * -8.210 2.380 -3.45 0.0006 * 
ASPAY8 —5.590 0.665 -8.41 < 0.0001 * -1.780 1.760 -1.01 0.3120 
ASPAY9 -4.460 0.507 -8.81 < 0.0001 * -1.430 1.260 -1.14 0.2540 
ASPAY10 -6.020 0.614 -9.82 < 0.0001 * 0.843 2.370 0.36 0.7230 
NSA_YAO -0.697 0.897 -0.78 0.4370 7.510 1.450 5.16 < 0.0001 * 
HH1 0.164 0.217 0.75 0.4510 0.741 0.394 1.88 0.06014 
HH2 -0.819 0.303 -2.70 0.0069 * -0.911 0.557 -1.63 0.1030 
HH3 1.580 0.457 3.45 0.0006 * 2.580 0.938 2.75 0.0060 * 
HH4 0.274 1.310 0.21 0.8360 3.610 2.660 1.36 0.1740 


54 ABS ¢ SMALL AREA ESTIMATION USING A MULTINOMIAL LOGIT MIXED MODEL * 1351.0.55.029 


13.3 Plot of analytical SEs versus bootstrap RMSEs of B for 2006 
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13.4 Plot of the small analytical SEs versus bootstrap RMSEs of B for 2006 
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13.5 Estimates of g 


2001 2006 
Parameter ML REML ML REML 
P1 0.0238 0.0280 0.0683 0.0758 
P2 0.0468 0.0595 0.0691 0.0853 
P12 0.0105 0.0125 0.0424 0.0466 
P 0.316 0.307 0.617 0.580 


Table 13.6 contains the bootstrap results for g for August 2006. In terms of bias, the 
REML estimator is performing reasonably well since all relative bias estimates are <2% 
in absolute value. From these results it is clear that the REML estimator is to be 
preferred over the ML estimator since not only are the estimated biases smaller (as 
one might expect), but the RRMSE estimates are also smaller for the REML estimator. 


13.6 Comparison of the REML and ML estimators of g for 2006 


Relative bias (%) RRMSE (%) 
Original eee eee ee ee eee eee ee ee ee ey 
Parameter estimate ML REML ML REML 
Pi 0.0758 -9.15 -1.56 15.56 13.22 
P2 0.0853 —19.54 —1.85 31.87 27.08 
P 0.580 6.71 0.80 22.68 20.15 


It is also of interest to test the significance of the variance components @ (since if 
these are not significantly different from zero, then there is no point using a mixed 
effects model). In Molina et al. (2007), the authors test the significance of their single 
variance component using a likelihood ratio test (which is based on the approximate 
marginal likelihood). In theory we could apply a similar test here, but unfortunately 
the distribution of the likelihood ratio test statistic in our case will not be easy to 
derive under the null hypothesis. This is because under the null hypothesis 

9 = (91,92,912)' =0 and g is on the boundary of the parameter space which is a non 
standard condition. Molina et al. (2007) were able to apply this test because in the 
case of one variance component, the null distribution is known to be a mixture of two 
7? distributions. 
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13.7 Histograms of (,, @, and p for 2006 and 2001 


200 


Frequency 
Frequency 


0 100 200 300 


é 
2 
2 
i fees 
Oo | 
i} T T T T T 1 i} T T T 1 
0.05 0.06 007 0.08 0.09 0.10 0.11 0.01 0.02 0.03 0.04 0.05 
2006 w 1 hat 2001 w 1 hat 
= 
7 8 
2 g 
B 6 FA 
5 8 4 5 8 
2 2 J 
2 6 2 
rae go 
[ites : L 
aa a eee _, 
a a | 
0.02 0.04 006 008 010 012 0.14 0.02 0.04 0.06 0.08 0.10 
2006 w2 hat 2001 w 2 hat 
= 
~ 3 — 
Bo 3 8 
z * . 
2 2 
“u 8 TT “oo 
8 
o — oO —_——__ 
T T T T 1 T T T P T h 1 
0.2 0.4 06 08 1.0 04 -0.2 0.0 0.2 04 0.6 08 
2006 rho hat 2001 rho hat 


In any case, as an alternative, the previous parametric bootstrap used to produce table 
13.6 will give some insight into the significance of the variance components. Using 
the 1000 REML estimates of each parameter, we build up empirical distributions under 
the fitted model of the REML estimators. Histograms of the 1000 sets of REML 
variance component estimates are given in figure 13.7. From these we can calculate 
approximate 95% parametric bootstrap confidence intervals using the percentile 
method. These confidence intervals are given in table 13.8. Notice that the 
confidence intervals for g; and @2 for both years are not close to zero and the 
distributions look roughly symmetric, giving some evidence that the variance 
components are significant and a random effects model is appropriate in both years. 


13.8 Approximate 95% confidence intervals for @,, @, and / for 2006 and 2001 


2006 2001 
Parameter Lower Upper Lower Upper 
ae a sieiniaisiae’s Bere dvi vhleie gene pes sida'y dines : pa 
P2 0.0439 0.1280 0.0272 0.0937 
p 0.371 0.788 -0.048 0.581 
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If we were to apply the model in Molina et a/. (2007) to our LFS data, we would be 
imposing that g=(91,91,91)'. Tables 13.5 and 13.8 give some evidence that this more 
restrictive model may not be appropriate especially for the 2001 data. To test the null 
hypothesis of g = (91,91,91)' for the 2006 data, a parametric bootstrap is used to 
generate empirical distributions under the null. The algorithm in Section 11 is used 
for this purpose, but note that we need to make one small change. Since we need to 
simulate under the null, in step (c) we replace all instances of 7, with uw, to ensure 
there is only one variance component. A histogram of the empirical distribution of p 
under the null hypothesis is given in figure 13.9 for the 2006 data. There is strong 
evidence to suggest that p= 1 does not hold since the observed / is 0.580 and is 
nowhere near any of the simulated f values under the null. Therefore our category 
specific random effects model appears to be more appropriate for our data than the 
more restrictive model used in Molina et al. (2007). 


13.9 Histogram of / under the null for 2006 
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14. RESIDUAL PLOTS AND GOODNESS OF FIT TESTS 


In the case of the multinomial logit model with no random effects or variance 
components, it is usual to assess model fit by computing various different summary 
statistics, measuring differences between the observed and fitted values. For example, 
as stated in Dobson (2001), one could use the Pearson 7? statistic, Deviance or the 
Likelihood ratio chi-squared statistic. 


Suppose we have no random effects or variance components @g (the model is a GLM 
(generalised linear model)), then the standard Pearson y? goodness of fit statistic is 


2 


j=ld=li=1 Fay 


where Eg is the estimated expected value of yg. The Pearson x goodness of fit test 
is used to determine whether or not the model appears to hold. Given that the model 
holds and under appropriate asymptotic conditions (expected counts Egy are large), 
the Pearson 7? statistic will follow an approximate Gww-p) distribution, where 

N= ?_, Iq = 7,820 is the total number of multinomial observations for the August 
2006 data (similarly we could also compute this for August 2001) and p = 37 is the 
number of parameters in B; (or B2). A large or small value of Xp indicates that overall 
the model does not appear to hold (the null hypothesis is that the model holds). 


Unfortunately we cannot apply the above test to our data since our model is not a 
GLM. The application of goodness of fit tests in a GLMM (generalised linear mixed 
model) situation is not straight forward theoretically. For instance, the observations 
are no longer independent and the Pearson statistic and other statistics are not 
guaranteed to have a y* distribution under the null hypothesis even when the 
expected counts are large. Also, another problem is that the estimates Egy are not 
easy to calculate as they involve integrals with no closed form solution. 


There are two approaches that we consider to get around the above issues. The first 
is to use as Eg, the conditional expectation estimates mp dij predicted from the 
model and the second is to use an estimate of E(y qj) from a Taylor series expansion 
about wz,4 =0, since in our case the variance components are small. The distribution of 
these 7? statistics can then be estimated by applying the parametric bootstrap of 
Section 11 using B = 1000. 


For our multinomial logit mixed model sample data we have 


E(vay)= E(E(yay |x) =mgE( Dai) (14.1) 
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and 
Var (va ) = E(Var( say | wa) +Var(E (yay | ua)) 


= maiE( Pay )(1 = E( Day )) + mg; (Mai —1)Var (Pay ): (14.2) 


It can be shown using Taylor series expansions that when the variance components 


are small, 
P.Pain (1 a 2 Pan )( - Pai 
E( Pain) * Pai +5 +P (puiaPan a PanPai2 (1 — Pair ) (14.3) 
+20 2 (2bci Paz — PanPai2 
PY (pin Pine = PairPaa (1 = Pan ) 
E( Paiz) ® Pare +5 +o Pai (1 — 2a (1 = Pair) (14.4) 
+2012 (2pci2Pan = PanPai 
P\ (ruin Pins — PaisPan (1 — Pair ) 
and E (Pais) & Pais +5 +P ( puiabins — PaisPair (1 = Pair ) (14.5) 


+402 Pain PairPais 


where P71, P72 and p73 are respectively Pagi1, Paiz and paz with ug replaced with the 
zero vector (the Taylor series expansions were taken about the point u7=0). Also, 
we can show that 


Var ( Pai ) a D1 Dai (1 — Pa af OrDain Pai = 20,2P ai (1 — Pg ) Pave (14.6) 
Var ( Pair ) © PD Pai Pdi2 + P2Pdi2 (1 — Pai2 — 292 Pai2Pair (1 — Pai2 (14.7) 


*2 *2 *2 *2 * * *2 
and Var (Pais) ® D Pai Paix + P2Pdi2Pais + 2P12PanPai2Pa3: (14.8) 
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We are now at a point where we can approximate Fay in the x? Statistic using 
estimates of E(yqj) based on the Taylor series approximations (14.3)—(14.5) and 
noting the relationship (14.1). Note that within these approximations, the parameters 
fand gare replaced by their estimates. Call this the unconditional approximation 
method. The other method as we mentioned earlier is to use the estimates of the 
conditional expectation EQaij | u 4) for Egy in the 7? statistic. Call this method the 
conditional method. 


14.1 Histogram of the 77 statistic using the conditional method for 2006 
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Using the parametric bootstrap of Section 11, we generate empirical distributions of 
the two 7? statistics under the fitted model based on a simulation of size B = 1000. 
The two distributions are summarised in figures 14.1 and 14.2 for the August 2006 
data. An approximate 95% confidence interval for the conditional method is (7035, 
7708) and an approximate 95% confidence interval for the unconditional 
approximation method is (8045, 8775). The values of the 7? statistic for the sample 
data are 7669 (conditional method) and 8703 (unconditional approximation method). 
Clearly these values are within the appropriate approximate confidence intervals, 
suggesting that overall both the models for yg and yay |uq are adequate for the 
August 2006 sample data (similar conclusions can also be drawn for the August 2001 
data, but details are not given here). 
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14.2 Histogram of the x? statistic using the unconditional approximation method for 2006 


Histogram of Chi-squared statistic 
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One issue associated with goodness of fit tests is that they only produce one overall 
summary measure. It is of interest to also examine individual deviations and look for 
outliers/influential points. Residual plots are useful for this purpose. For each labour 
force status / = 1, 2,3, define the following conditional standardised residuals for the 
in-sample data (Molina et al. (2007) call these Pearson residuals) 


é Vai ~™ ai Pai 


"ij = = 5 : 
Mai Pai (1 — Pai ) 


We can also calculate approximate unconditional standardised residuals as follows 


uc _ J aij -E (yay) 
nn dl 
Var ( yay) 


where F ( Vai ) and Var( Vay) are estimates based on the earlier Taylor series 
approximations (14.3)—(14.8) and the relationships (14.1) and (14.2) with parameters 


Band @ replaced with estimates. 
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Interestingly, we can also calculate the following summary statistics for each labour 
force category j= 1,2,3 


2 
86 => (ri) 
d=li=1 
D Iq 2 
and si = > (rif) 
d=1i=1 


and compare these with appropriate empirical distributions generated from a 
parametric bootstrap under the fitted model with B = 1000 (again we use the 
bootstrap approach in Section 11). The empirical distributions are given in figure 
14.3. Table 14.4 contains the S© and S“ values for the August 2006 sample data as well 
as approximate 95% parametric bootstrap confidence intervals using the percentile 
method. None of the S° or S“ values are within their corresponding confidence 
intervals. These statistics indicate that there may be underdispersion present for the 
Unemployed counts and overdispersion for both the Employed and NILF counts. 
Previously when we calculated the overall v7 statistics we did not obtain significant 
values. This is because the over and underdispersion in a way averaged themselves 


out overall. 


14.3 Histogram of the S© and S“ statistics for 2006 from a parametric bootstrap 
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14.4 S° and S“ values and associated parametric bootstrap 95% confidence intervals 


Conditional Unconditional 
LFS status SE Lower Upper Su Lower Upper 
Employed 4,224 3,522 3,848 4,306 3,741 4,030 
Unemployed 3,393 3,407 4,000 3,565 3,578 4,137 
Not in labour force 4,253 3,503 3,842 4,407 3,736 4,043 


We now examine some residual plots to look more closely at this potential under and 
overdispersion issue. Figure 14.5 contains the standardised residuals rf, versus the 
order of increasing predicted values maf ,,, for the 2006 employed sample data. The 
panel labelled original contains the original sample and the other three panels contain 
values calculated from three parametric bootstrap samples. Similarly, figure 14.6 
contains the unconditional standardised residuals r#;, versus the order of increasing 
predicted values (yg). Similar plots for NILF and unemployed are given in figures 
14.7-14.10. 


When comparing the bootstrap generated employed and NILF samples with the 
original data, it appears that the original data contain a small number of larger 
absolute residual values for both Employed and NILF. We actually recalculated the S© 
and S“© statistics by setting these few larger absolute residuals to zero for employed 
and NILF, but the resulting S© and S“ statistics were still significantly large. Hence 
these couple of larger absolute residuals are not solely responsible for the apparent 
overdispersion. In any case, when ignoring the small number of larger absolute 
residuals, the sample and original data distributions look roughly similar and hence we 
argue that the apparent under and overdispersion is not large enough for us to be 
overly concerned. In the significance tests we are clearly only picking up small 
significant differences and we suspect this is because our sample sizes are large. 


We mentioned above that there were a small number of larger residuals in absolute 
value than one might expect when compared to the bootstrap samples. Most of these 
do not appear to be overly large and the NILF and employed ones mostly correspond 
to the same units. Upon further investigation, the only thing these outliering units 
have in common is that they are mostly all from remote areas (REMOTE3= 1). 
Therefore the model may not be doing as well in the remote areas. Note that there is 
not much we can do about this issue because the sample sizes are small in the remote 
areas and we do not have any further covariates available. 
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14.5 Plots of the conditional employed standardised residuals versus order of predicted values 
for original 2006 data and for three parametric bootstrap simulations 
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14.6 Plots of the unconditional employed standardised residuals versus predicted 
values for original 2006 data and for three parametric bootstrap simulations 
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14.7 Plots of the conditional NILF standardised residuals versus order of predicted 
values for original 2006 data and for three parametric bootstrap simulations 
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14.8 Plots of the unconditional NILF standardised residuals versus predicted values 
for original 2006 data and for three parametric bootstrap simulations 
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14.9 Plots of the conditional unemployed standardised residuals versus order of predicted values 
for original 2006 data and for three parametric bootstrap simulations 
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14.10 Plots of the unconditional unemployed standardised residuals versus predicted 
values for original 2006 data and for three parametric bootstrap simulations 
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14.11 Q-Q plots of Zi,7; for the original 2006 sample and three bootstrap samples 
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14.12 Q-Q plots of 77,72 for the original 2006 sample and three bootstrap samples 
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So far we have said nothing about the predicted random effects #. Figure 14.11 
contains a plot of the quantiles of standardised #,, values versus theoretical standard 
normal quantiles. This is done for the original 2006 sample and three samples 
generated using the parametric bootstrap. Similarly, figure 14.12 contains a plot of the 
quantiles of standardised @ 2 values versus theoretical standard normal quantiles. 
Figure 14.13 contains a plot of fg, versus fig2. From these figures it appears that the 
original 2006 sample tg values behave similar to those obtained from ‘typical’ 
samples. The estimated #g values from the original sample therefore do not appear 
to give any indication of model departure. 


14.13 Plots of ti; versus éi72 for the original 2006 sample and three bootstrap samples 
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15. SMALL AREA ESTIMATES AND MSE ESTIMATES 


Similar to figure 8 in Molina et al. (2007), figure 15.1 contains plots of the ratios of 
direct RSE estimates to model analytical RRMSE estimates versus sample sizes for 
2006. The line y = 1 is also plotted. A ratio greater than 1 indicates we get gains by 
using the model based approach, whereas a ratio less than 1 indicates we get gains by 
using the direct survey estimation approach. Since all ratios are greater than 1 we are 
always getting gains by using the model based approach. The gains are quite large 
when the sample sizes are small and are small when the sample sizes are larger. 
Therefore when the sample size is small, the model based estimates have much small 
estimated MSEs than the direct survey estimates. Hence we have successfully reduced 
the MSEs by using a model based approach. 


15.1 Plots of the ratios of direct RSE estimates to model 
analytical RRMSE estimates versus sample sizes for 2006 
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Figure 15.2 contains a comparison between the model based estimates (8.2) and the 
direct survey estimates for the August 2006 in-sample small areas. Note that the 
estimates of the NILF small area totals are obtained via subtraction. These plots are 
useful as a check for bias. For confidentiality reasons we have removed the actual 
numbers on the plots. The three plots on the left contain all of the estimates, whereas 
the plots on the right contain only the estimates closer to zero. The line y =x is also 
drawn on these plots. Note that the direct survey estimates should be approximately 
design unbiased but with large standard errors. Figure 15.2 suggests that the model 
based estimates for Employed and NILF are roughly unbiased since although there is 
some variation, the estimates are distributed roughly about the line y =x. The 
unemployment model based estimates appear to be a little worse in some cases. For 
instance, when the direct estimates are large, the model based estimate tends to be 
smaller. However, for the most part, the model based estimators appear roughly 
unbiased or have a small bias. 


15.2 Comparison between model based small area estimates 
and direct survey estimates for 2006 
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We now consider mean squared error estimation for the estimated small area totals. 
The mean squared error matrices can be estimated using two methods, either by an 
analytical approximation or a parametric bootstrap. For further details on the 
analytical approximation and the parametric bootstrap see Sections 8,9 and 11. A 
comparison of the average percentage RRMSEs for 2006 derived from the analytical 
approximation and the parametric bootstrap with B = 1000 is given in table 15.3. On 
average the differences between the two methods are very small. The largest average 
absolute difference is for unemployment and this is only 0.83% and 0.62% for 
respectively the in-sample and out-of-sample small areas. 


15.3 2006 Average RRMSE (%) estimates 


In-sample areas Out-of-sample areas 
LFS status Bootstrap Analytical Bootstrap Analytical 
Employed 4.84 4.86 7.44 7.47 
Unemployed 24.52 23.69 32.30 31.68 
Not in labour force 13.34 13.26 19.66 19.72 


Figure 15.4 contains an overall comparison between the analytical and parametric 
bootstrap RRMSEs for August 2006. The line y =x is also plotted. Figure 15.4 also 
confirms that on average the RRMSEs from both methods compare well since the 
values are roughly distributed about the line y=. The in-sample unemployment 
analytical RRMSE estimates appear a little worse than the others on average since the 
analytical approximation looks to be on average slightly overestimating the smaller 
RRMSEs and underestimating the larger RRMSEs. 


Table 15.5 contains the 2.5th and 97.5th percentiles of the distribution of the 
differences between the 2006 parametric bootstrap and the analytical RRMSE 
percentages. Figure 15.4 and table 15.5 shows that there is some variability in the 
differences between the parametric bootstrap and analytical RRMSEs. However, this 
variability is not overly large with the worse case being for the in-sample unemployed 
and the majority of these differences are < 4% in absolute value. Note also that some 
of these larger differences could also be due to extra variation resulting from the 
parametric bootstrap since B = 1000 is only a moderately large value. 
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15.4 Comparison between analytical and bootstrap RRMSEs 
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15.5 2.5th and 97.5th percentiles of the distribution of the differences between 
the 2006 parametric bootstrap and analytical RRMSE percentages 


In-sample areas Out-of-sample areas 
2.5th 97.5th 2.5th 97.5th 
LFS status percentile percentile percentile percentile 
Employed -0.50 0.50 —0.37 0.32 
Unemployed -2.54 4.24 -1.03 2.45 
Not in labour force -1.50 2.12 -0.95 1.00 
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16. CONCLUSION 


This paper successfully adapted the model and estimators described in Molina et al. 
(2007) to include category specific random effects. We showed that the category 
specific multinomial logit mixed model is more appropriate for our dataset than the 
more restrictive one given by Molina et al. (2007). The PQL-REML estimation 
procedure worked very well in our context and we showed via a parametric bootstrap 
that the PQL-REML estimators had good statistical properties. For instance, the bias in 
the REML variance component estimates was found to be small. 


Similar to Molina et al. (2007), we described and derived two different estimators of 
the mean squared errors of the small area estimated totals. These two different 
approaches are based on using an analytical approximation and a parametric 
bootstrap. In the paper by Molina et al. (2007), the authors undertook a simulation 
study and concluded that the bootstrap estimator performed better than the analytical 
approximation and recommended the bootstrap be used. However they noted that 
the differences were smaller for the actual UK unemployment data. We showed that 
for the Australian labour force data that the analytical approximation RRMSEs 
compared very well with the parametric bootstrap RRMSEs and the differences were 
all reasonably small. In our context we recommend that the analytical RRMSEs be 
used because our parametric bootstrap is much more computationally intensive than 
the one given in Molina et al. (2007). We believe the small gains in accuracy will not 
be worth the extra computational effort involved for the parametric bootstrap in our 
case. 


In this paper we used residual plots and y? goodness of fit tests to check model 
assumptions. We used a parametric bootstrap to generate the empirical distributions 
of the 7? statistic. These tests and plots indicated that the model assumptions 
appeared to hold approximately for the sample data. There was very slight under and 
overdispersion present and a couple of small outliers for remote areas. In a future 
study we might try to improve the model for remote areas. In any case, for the most 
part the multinomial logit mixed model appears to work reasonably well for modelling 
the Australian Labour Force count data. Interestingly, the multinomial model has 
quite a restrictive variance and correlation structure and the fact that the multinomial 
model works so well here is very convenient. This is because extending the model to 
account for under and overdispersion in a small area context would not be straight 
forward. This would certainly be an interesting topic for future research. 


Another future research topic could be to try account better for the sample design and 
any design informativeness. Our estimators like those in the Molina et al. (2007) 
paper essentially assume that the Labour Force sample has been collected using 
SRSWOR. 


74 ABS ¢ SMALL AREA ESTIMATION USING A MULTINOMIAL LOGIT MIXED MODEL * 1351.0.55.029 


ACKNOWLEDGEMENTS 


This work was initially intended to form a small introductory part of my PhD thesis 
and has since evolved into a separate more detailed joint ABS and ANU research 
paper. As such, I would like to acknowledge my PhD supervisor Professor Alan Welsh 
for his technical advice and continual support throughout this work and for putting 
me onto this topic in the first place. 


I would also like to thank my small area estimation colleagues, in particular Daniel 
Elazar for their support and help and for allowing me to work on this topic part time 
at the ABS. Also, I would especially like to thank those ABS staff who manipulated the 
data and got it into a useable form which I could then run the multinomial logit mixed 
models on. 


I would also like to acknowledge DEEWR, the Department of Education, Employment 
and Workplace Relations, for making their administrative data available to use, and I 
very much appreciate their effort. 


Finally, I would like to acknowledge two other ABS colleagues. Thanks especially to 
Peter Rossiter for his substantial help with formatting this document. Also, thanks 
very much to Frank Yu for proof reading this paper and providing some very useful 
and relevant comments. 


ABS * SMALL AREA ESTIMATION USING A MULTINOMIAL LOGIT MIXED MODEL * 1351.0.55.029 75 


REFERENCES 


Australian Bureau of Statistics (2001) Australian Standard Geographical 
Classification (ASGC), 2001, cat. no. 1216.0, ABS, Canberra. 


— (2003) Information Paper: Census of Population and Housing — 
Socio-Economic Indexes for Areas, Australia, 2001, cat. no. 2039.0, ABS, 
Canberra. 


— (2007) Regional Population Growth, cat. no. 3218.0, ABS, Canberra. 


Baillo, A. and Molina, I. (2005) Mean Squared Errors of Small Area Estimators under 
a Unit-Level Multivariate Model, Statistics and Econometrics Working Paper 
ws054007, Universidad Carlos Il de Madrid, Madrid. (last viewed 13 January 
2010) 
< http://ideas.repec.org/p/cte/wsrepe/ws054007.html > 


Booth, J.G. and Hobert, J.P. (1999) “Maximising Generalised Linear Mixed Model 
Likelihoods with an Automated Monte Carlo EM Algorithm”, Journal of the 
Royal Statistical Society: Series B (Statistical Methodology), 61(1), pp. 265-285. 


Breslow, N.E. and Clayton, D.G. (1993) “Approximate Inference in Generalized Linear 
Mixed Models”, Journal of the American Statistical Association, 88(421), 
pp. 9-25. 

Dobson, AJ. (2001) An Introduction to Generalized Linear Models, Second Edition, 
Chapman and Hall/CRC Press, London. 


Geweke, J. (1996) “Monte Carlo Simulation and Numerical Integration”, in H.M. 
Amman, D.A. Kendrick and J. Rust (eds), Handbook of Computational 
Economics, Volume I, Elsevier Science Publishers B.V., Amsterdam. 


Hartzel, J.; Agresti, A. and Caffo, B. (2001) “Multinomial Logit Random Effects 
Models”, Statistical Modelling, 1, pp. 81-102. 


Harville, D.A. (1977) “Maximum Likelihood Approaches to Variance Component 
Estimation and to Related Problems”, Journal of the American Statistical 
Association, 72(358), pp. 320-340. 


Henderson, H.V. and Searle, S.R. (1981) “On Deriving the Inverse of a Sum of 
Matrices”, SIAM Review, 23(1), pp. 53-60. 


Jiang, J. (1998) “Consistent Estimators in Generalised Linear Mixed Models”, Journal 
of the American Statistical Association, 93(442), pp. 720-729. 


Jiang, J. (2007) Linear and Generalized Linear Mixed Models and Their 
Applications, Springer Science and Business Media LLC, New York. 


76 ABS ¢ SMALL AREA ESTIMATION USING A MULTINOMIAL LOGIT MIXED MODEL * 1351.0.55.029 


Kackar, R.N. and Harville, D.A. (1984) “Approximations for Standard Errors of 
Estimators of Fixed and Random Effects in Mixed Linear Models”, Journal of the 
American Statistical Association, 79(388), pp. 853-862. 


Lee, Y. and Nelder, J.A. (1996) “Hierarchical Generalized Linear Models (with 
Discussion)”, Journal of the Royal Statistical Society: Series B (Methodological), 
58(4), pp. 619-678. 


Molina, I.; Saei, A. and Lombardia, M.J. (2007) “Small Area Estimates of Labour Force 
Participation under a Multinomial Logit Mixed Model”, Journal of the Royal 
Statistical Society: Series A (Statistics in Society), 170(4), pp. 975-1000. 


Pawitan, Y. (2001) In All Likelihood: Statistical Modelling and Inference Using 
Likelibood, Oxford University Press. 


Prasad, N.G.N. and Rao, J.N.K. (1990) “The Estimation of the Mean Squared Error of 
Small-Area Estimators”, Journal of the American Statistical Association, 85(409), 
pp. 163-171. 


Rao, C.R. and Kleffe, J. (1988) Estimation of Variance Components and Applications, 
Elsevier Science Publishers B.V., Amsterdam. 


Saei, A. and Chambers, R. (2003) Small Area Estimation Under Linear and 
Generalized Linear Mixed Models With Time and Area Effects, S3RI 
Methodology Working Papers, M03/15, Southampton Statistical Sciences 
Research Institute. (last viewed 13 January 2010) 
< http://eprints.soton.ac.uk/8165/ > 


Saei, A. and Chambers, R. (2005) Empirical Best Linear Unbiased Prediction for Out 
of Sample Areas, S3R1 Methodology Working Papers, M05/03, Southampton 
Statistical Sciences Research Institute. (last viewed 13 January 2010) 
< http://eprints.soton.ac.uk/14073/ > 


Schall, R. (1991) “Estimation in Generalized Linear Models with Random Effects”, 
Biometrika, 78(4), pp. 719-727. 


ABS * SMALL AREA ESTIMATION USING A MULTINOMIAL LOGIT MIXED MODEL * 1351.0.55.029 77 


APPENDIX 


This appendix contains the proof of E(G, (@)) =G, (9) —G; (9) (see Section 8). 


First we take a second order Taylor series expansion (and assume that ¥ in T does not 


depend on @) 
F=1(9)=T (9+ (A a) 6, Pe 
oO M12) 1TH i ale ay 
sere wh +e OO )(G—%) 


OT. xh A OF * 
af P-M)\(P2 P2) (Q; 92)(Pi2 -Pi2) 


Assuming that 7 is approximately constant and does not depend on zw (technically T 


depends on z) and E(@) = g, then 


ae: P2 
1 @? * O-T 
 Fagh B((b.2 P12) J+ soca 21)(2-92)) 
oT O-T 
E((A-A)(P2- E((¢ eA AT) 
00,00, (a A” )(Pr2 G2) 2 a ((¢ 2)(Pr2 P12) 


After some algebra and making use of (8.16), (8.18) and (8.19) it can be proved that 
fora =1,2.12 and b= 1, 2,12, 


or __ ow gigtig OW OW pty, OW 
00,0 D, OP OP, OP OPg 
Pua Z'vZ ow. z'v ‘zw + MW Ziv'z ow z'v ‘zw 
OP, OPp Pp Pa 
+wz'v'zZ aw AVOL ae wz'v'z ue 22 ey 
OPg OMp Dp OP,g 
wz'viz ™& gyg™ gy gw 
OPa OP 
_w2'vz ™ giz ™ gy aw. (A.2) 
OP, OPa 
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Now, assuming that Ma~Ma (i.e. Mz is approximately constant), then 


MyE(T)Min Mnk(T)Mip 


ae ryt = 
E(G,(@)) * MgE(T) M4 Mok (PF) Min Mpk(T)Mip 


(A.3) 


Now substitute (A.1) into (A.3) and using (8.12) and (A.2) we obtain 


11 1,2 11 1,2 11 1,2 
be BY?) [BS3? BS?) [be be? 


12,12 12,12 
E(G,(9))=G(g)-| = a ee . 

12 22 12 29 12 29 

be? BE?) bys? B83?) (ea bap 
ied 12 1 1,2 11 1 
bie : De : be ; BS ; Bay ba 
12 me 12 23 12 29 
ae : BS » bs ? BS : Bay bes 
14 ee 11 12 11 12 
Boy boy bi Be bos ey 
12 2,2 12 i) 12 2,2 
Boe ee bie be bo? bos 


where for a=1,2,12,b=1,2,12,7=1,2 andk=1,2, 


o(M,,wz'v) : 6(My,Wwz'v") 


De = ap oa E((@ —Pa\(P -9)) 
a 


and therefore 


ela(a))=ai(o)-| 8 2 


812 ae 
=G,(9)-G;(¢). 
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FOR MORE INFORMATION .. . 


www.abs.gov.au_ the ABS website is the best place for 
data from our publications and information about the ABS. 


INTERNET 


INFORMATION AND REFERRAL SERVICE 


Our consultants can help you access the full range of 
information published by the ABS that is available free of 
charge from our website. Information tailored to your 
needs can also be requested as a ‘user pays' service. 
Specialists are on hand to help you with analytical or 
methodological advice. 


PHONE 1300 135 070 

EMAIL client.services@abs.gov.au 

FAX 1300 135 211 

POST Client Services, ABS, GPO Box 796, Sydney NSW 2001 


FREE ACCESS TO STATISTICS 


All statistics on the ABS website can be downloaded free 
of charge. 


WEB ADDRESS Www.abs.gov.au 
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